Clojure

clojure.string/replace behaves unexpectedly when \ or $ are part of the result string

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: Release 1.2, Release 1.3
  • Fix Version/s: Release 1.5
  • Component/s: None
  • Labels:
    None
  • Environment:
    HW/OS/SW indipendant - issue is part of java interface
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

clojure.string/replace uses javas replace function to do it's work, the replace function has the tricky habit of 'double evaluating' the replacement string (third argument). This means that every \ in the string (so i.e. "
") is evaluated by the java code behind replace.

Since this behavior isn't documented it can lead to confusing errors, for example (made up semi real world example to explain the issue):

(clojure.string/replace "c:/windows/" #"/" "\\")

This should replace all unix folder separators with windows separators, this crashes with a index out of bound exemption since java tries to evaluate "
" as a semi regexp.

or

(println (str "\"" (clojure.string/replace "my string with \\ and \" in it" #"[\"\\]" (fn [x] (str "\\" x))) "\""))

The expected result would be:

"my string with \\ and \" in it"

the actual result is:

"my string with \ and " in it"

This should return an 'escaped' string, it does not since the 'double evaluation' of the return string results in \\\\ being reduced to
again.

  1. clj-753-870-905-combined-fix4.patch
    15/Aug/12 11:19 AM
    9 kB
    Andy Fingerhut
  2. CLJ-753-870-905-combined-fix3.readme.txt
    28/Feb/12 12:38 PM
    3 kB
    Andy Fingerhut
  3. CLJ-753-870-905-combined-fix3.patch
    28/Feb/12 12:38 PM
    9 kB
    Andy Fingerhut
  4. 0002-Fix-870-by-using-quoteReplacement.patch
    05/Jan/12 8:53 PM
    1 kB
    Alan Malloy
  5. 0001-Add-tests-for-870.patch
    05/Jan/12 8:53 PM
    0.9 kB
    Alan Malloy

Activity

Hide
Heinz N. Gies added a comment -

A working code for the replace example is:

(println (str "\"" (clojure.string/replace "my string with \\ and \" in it" #"[\"\\]" (fn [x] (str  "\\\\"  (if (= x "\\") "\\") x))) "\""))

which is horribly ugly since you need to hand escape the backslash or it crashes.

Show
Heinz N. Gies added a comment - A working code for the replace example is:
(println (str "\"" (clojure.string/replace "my string with \\ and \" in it" #"[\"\\]" (fn [x] (str  "\\\\"  (if (= x "\\") "\\") x))) "\""))
which is horribly ugly since you need to hand escape the backslash or it crashes.
Hide
Stuart Sierra added a comment -

Vetted on 1.4.0 master. Needs patch.

Show
Stuart Sierra added a comment - Vetted on 1.4.0 master. Needs patch.
Hide
Alan Malloy added a comment - - edited

No hand-escaping is necessary, just use #(java.util.regex.Matcher/quoteReplacement %).

Edit: I see, we're talking about when you use a function as a replacement, not a replacement string like "
x" - the function's output should not be interpreted as regex replacement, I agree. Looks like it just needs a small patch in clojure.string/replace-by.

Show
Alan Malloy added a comment - - edited No hand-escaping is necessary, just use #(java.util.regex.Matcher/quoteReplacement %). Edit: I see, we're talking about when you use a function as a replacement, not a replacement string like "
x" - the function's output should not be interpreted as regex replacement, I agree. Looks like it just needs a small patch in clojure.string/replace-by.
Hide
Alan Malloy added a comment -

Patch and test for this issue.

Show
Alan Malloy added a comment - Patch and test for this issue.
Hide
Andy Fingerhut added a comment -

CLJ-870-also-fix-replace-first.patch combines Alan Malloy's two patches, and adds the same change for replace-first. I like this change, too, and think replace and replace-first should be consistent with each other in this behavior.

Show
Andy Fingerhut added a comment - CLJ-870-also-fix-replace-first.patch combines Alan Malloy's two patches, and adds the same change for replace-first. I like this change, too, and think replace and replace-first should be consistent with each other in this behavior.
Hide
Andy Fingerhut added a comment -

If there is any interest in a combined patch for this issue, CLJ-753 and CLJ-905 all in one, I'd be happy to merge them.

Show
Andy Fingerhut added a comment - If there is any interest in a combined patch for this issue, CLJ-753 and CLJ-905 all in one, I'd be happy to merge them.
Hide
Andy Fingerhut added a comment -

Proposed combined patch for CLJ-753, CLJ-870, and CLJ-905, since they all affect the behavior of clojure.string/replace and replace-first.

Show
Andy Fingerhut added a comment - Proposed combined patch for CLJ-753, CLJ-870, and CLJ-905, since they all affect the behavior of clojure.string/replace and replace-first.
Hide
Stuart Sierra added a comment -

The 1/12/2012 combined patch is not in correct format. Please recreate, see http://clojure.org/patches for correct format.

Show
Stuart Sierra added a comment - The 1/12/2012 combined patch is not in correct format. Please recreate, see http://clojure.org/patches for correct format.
Hide
Andy Fingerhut added a comment -

CLJ-753-870-905-combined-fix2.patch should have proper patch format.

Show
Andy Fingerhut added a comment - CLJ-753-870-905-combined-fix2.patch should have proper patch format.
Hide
Andy Fingerhut added a comment - - edited

Attaching slightly improved patch CLJ-753-870-905-combined-fix3.patch, along with a README to explain in gory detail the behavior and performance changes to clojure.string/replace and clojure.string/replace-first, in case it would help anyone review the changes. Deleting my older patches.

Show
Andy Fingerhut added a comment - - edited Attaching slightly improved patch CLJ-753-870-905-combined-fix3.patch, along with a README to explain in gory detail the behavior and performance changes to clojure.string/replace and clojure.string/replace-first, in case it would help anyone review the changes. Deleting my older patches.
Hide
Aaron Bedra added a comment -

This is a small nit pick, but re-qr is a pretty non-descriptive name. Perhaps we could change it to re-quote-replacement?

Show
Aaron Bedra added a comment - This is a small nit pick, but re-qr is a pretty non-descriptive name. Perhaps we could change it to re-quote-replacement?
Hide
Andy Fingerhut added a comment -

Patches clj-753-870-905-combined-fix4.patch dated Aug 15 2012 and the patch that Aaron reviewed, CLJ-753-870-905-combined-fix3.patch dated Feb 28 2012, are almost identical.

fix3 uses the name re-qr for a function, where fix4 uses the name re-quote-replacement. I have no strong preference for either of them.

Show
Andy Fingerhut added a comment - Patches clj-753-870-905-combined-fix4.patch dated Aug 15 2012 and the patch that Aaron reviewed, CLJ-753-870-905-combined-fix3.patch dated Feb 28 2012, are almost identical. fix3 uses the name re-qr for a function, where fix4 uses the name re-quote-replacement. I have no strong preference for either of them.
Hide
Andy Fingerhut added a comment -

Added patch addressing what I think is Aaron's reason for changing state to Incomplete, so as a result I am changing state back to its former Vetted state. If there is something else a non-screener should be doing to bring Incomplete tickets to the attention of screeners, please let me know, and I will document it here: http://dev.clojure.org/display/design/JIRA+workflow

Show
Andy Fingerhut added a comment - Added patch addressing what I think is Aaron's reason for changing state to Incomplete, so as a result I am changing state back to its former Vetted state. If there is something else a non-screener should be doing to bring Incomplete tickets to the attention of screeners, please let me know, and I will document it here: http://dev.clojure.org/display/design/JIRA+workflow
Hide
Stuart Sierra added a comment -

I will be looking at this.

Show
Stuart Sierra added a comment - I will be looking at this.
Hide
Stuart Sierra added a comment -

Screened. This is an adequate fix for this particular problem, given the current API.

However, the long and complicated docstring proves that replace and replace-first should each be four separate functions:

code
replace-char [input char char]
replace-string [input string string]
replace-re [input pattern string]
replace-re-by [input pattern fn]
code

This is how these functions were originally written back in the days of clojure.contrib.str-utils. This would be a minor breaking change.

Show
Stuart Sierra added a comment - Screened. This is an adequate fix for this particular problem, given the current API. However, the long and complicated docstring proves that replace and replace-first should each be four separate functions: code replace-char [input char char] replace-string [input string string] replace-re [input pattern string] replace-re-by [input pattern fn] code This is how these functions were originally written back in the days of clojure.contrib.str-utils. This would be a minor breaking change.

People

Vote (2)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: