<< Back to previous view

[CLJS-814] clojure.string/reverse breaks surrogate pairs Created: 12/Jun/14  Updated: 02/Dec/14  Resolved: 02/Dec/14

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Francis Avila Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: string

r2227, NOT under Rhino.

Attachments: Text File cljs-814.patch    
Patch: Code and Test


clojure.string/reverse will reverse a string by code units instead of code points which causes surrogate pairs to become reversed and unmatched (i.e., invalid utf-16).

For example, in clojurescript (clojure.core/reverse "a\uD834\uDD1Ec") will produce the invalid "c\uDD1E\uD834a" instead of "c\uD834\uDD1Ea". Clojure produces the correct result because the underlying Java String.reverse() keeps surrogate pairs together.

Note that clojurescript running under Rhino will produce the same (correct) result as clojure, probably because Rhino is using String.reverse() internally.

Attached patch gives clojure.string/reverse the exact same behavior in clj and cljs (including non-commutativity for strings with unmatched surrogates).

(Also, neither clojure nor clojurescript nor java reverse combining characters correctly--the combining character will "move" over a different letter. I suspect this is a WONTFIX for both clojure and clojurescript, but it is fixable with another regex replacement.)

Comment by Francis Avila [ 12/Jun/14 1:27 AM ]

Forget what I said about Rhino, it's just as broken there. I think I got my repls confused or something.

Comment by David Nolen [ 02/Dec/14 5:36 AM ]

fixed https://github.com/clojure/clojurescript/commit/c5e96e75a3328139edfe57df1c9a191ff460cce2

Generated at Sat Jan 20 13:33:21 CST 2018 using JIRA 4.4#649-r158309.