<< Back to previous view

[CLJ-870] clojure.string/replace behaves unexpectedly when \ or $ are part of the result string Created: 04/Nov/11  Updated: 18/Aug/12  Resolved: 18/Aug/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.2, Release 1.3
Fix Version/s: Release 1.5

Type: Defect Priority: Minor
Reporter: Heinz N. Gies Assignee: Stuart Sierra
Resolution: Completed Votes: 2
Labels: None
Environment:

HW/OS/SW indipendant - issue is part of java interface


Attachments: Text File 0001-Add-tests-for-870.patch     Text File 0002-Fix-870-by-using-quoteReplacement.patch     Text File CLJ-753-870-905-combined-fix3.patch     Text File CLJ-753-870-905-combined-fix3.readme.txt     Text File clj-753-870-905-combined-fix4.patch    
Patch: Code and Test
Approval: Ok
Waiting On: Rich Hickey

 Description   

clojure.string/replace uses javas replace function to do it's work, the replace function has the tricky habit of 'double evaluating' the replacement string (third argument). This means that every \ in the string (so i.e. "
") is evaluated by the java code behind replace.

Since this behavior isn't documented it can lead to confusing errors, for example (made up semi real world example to explain the issue):

(clojure.string/replace "c:/windows/" #"/" "\\")

This should replace all unix folder separators with windows separators, this crashes with a index out of bound exemption since java tries to evaluate "
" as a semi regexp.

or

(println (str "\"" (clojure.string/replace "my string with \\ and \" in it" #"[\"\\]" (fn [x] (str "\\" x))) "\""))

The expected result would be:

"my string with \\ and \" in it"

the actual result is:

"my string with \ and " in it"

This should return an 'escaped' string, it does not since the 'double evaluation' of the return string results in \\\\ being reduced to
again.



 Comments   
Comment by Heinz N. Gies [ 04/Nov/11 5:31 AM ]

A working code for the replace example is:

(println (str "\"" (clojure.string/replace "my string with \\ and \" in it" #"[\"\\]" (fn [x] (str  "\\\\"  (if (= x "\\") "\\") x))) "\""))

which is horribly ugly since you need to hand escape the backslash or it crashes.

Comment by Stuart Sierra [ 09/Dec/11 3:31 PM ]

Vetted on 1.4.0 master. Needs patch.

Comment by Alan Malloy [ 05/Jan/12 8:30 PM ]

No hand-escaping is necessary, just use #(java.util.regex.Matcher/quoteReplacement %).

Edit: I see, we're talking about when you use a function as a replacement, not a replacement string like "
x" - the function's output should not be interpreted as regex replacement, I agree. Looks like it just needs a small patch in clojure.string/replace-by.

Comment by Alan Malloy [ 05/Jan/12 8:53 PM ]

Patch and test for this issue.

Comment by Andy Fingerhut [ 06/Jan/12 12:25 AM ]

CLJ-870-also-fix-replace-first.patch combines Alan Malloy's two patches, and adds the same change for replace-first. I like this change, too, and think replace and replace-first should be consistent with each other in this behavior.

Comment by Andy Fingerhut [ 06/Jan/12 12:32 AM ]

If there is any interest in a combined patch for this issue, CLJ-753 and CLJ-905 all in one, I'd be happy to merge them.

Comment by Andy Fingerhut [ 12/Jan/12 12:05 AM ]

Proposed combined patch for CLJ-753, CLJ-870, and CLJ-905, since they all affect the behavior of clojure.string/replace and replace-first.

Comment by Stuart Sierra [ 03/Feb/12 9:15 AM ]

The 1/12/2012 combined patch is not in correct format. Please recreate, see http://clojure.org/patches for correct format.

Comment by Andy Fingerhut [ 04/Feb/12 10:51 AM ]

CLJ-753-870-905-combined-fix2.patch should have proper patch format.

Comment by Andy Fingerhut [ 28/Feb/12 12:38 PM ]

Attaching slightly improved patch CLJ-753-870-905-combined-fix3.patch, along with a README to explain in gory detail the behavior and performance changes to clojure.string/replace and clojure.string/replace-first, in case it would help anyone review the changes. Deleting my older patches.

Comment by Aaron Bedra [ 14/Aug/12 8:44 PM ]

This is a small nit pick, but re-qr is a pretty non-descriptive name. Perhaps we could change it to re-quote-replacement?

Comment by Andy Fingerhut [ 15/Aug/12 11:19 AM ]

Patches clj-753-870-905-combined-fix4.patch dated Aug 15 2012 and the patch that Aaron reviewed, CLJ-753-870-905-combined-fix3.patch dated Feb 28 2012, are almost identical.

fix3 uses the name re-qr for a function, where fix4 uses the name re-quote-replacement. I have no strong preference for either of them.

Comment by Andy Fingerhut [ 15/Aug/12 11:23 AM ]

Added patch addressing what I think is Aaron's reason for changing state to Incomplete, so as a result I am changing state back to its former Vetted state. If there is something else a non-screener should be doing to bring Incomplete tickets to the attention of screeners, please let me know, and I will document it here: http://dev.clojure.org/display/design/JIRA+workflow

Comment by Stuart Sierra [ 15/Aug/12 2:42 PM ]

I will be looking at this.

Comment by Stuart Sierra [ 17/Aug/12 7:55 AM ]

Screened. This is an adequate fix for this particular problem, given the current API.

However, the long and complicated docstring proves that replace and replace-first should each be four separate functions:

code
replace-char [input char char]
replace-string [input string string]
replace-re [input pattern string]
replace-re-by [input pattern fn]
code

This is how these functions were originally written back in the days of clojure.contrib.str-utils. This would be a minor breaking change.

Generated at Sat Nov 22 11:04:03 CST 2014 using JIRA 4.4#649-r158309.