[CLJS-814] clojure.string/reverse breaks surrogate pairs Created: 12/Jun/14 Updated: 02/Dec/14 Resolved: 02/Dec/14
r2227, NOT under Rhino.
|Patch:||Code and Test|
clojure.string/reverse will reverse a string by code units instead of code points which causes surrogate pairs to become reversed and unmatched (i.e., invalid utf-16).
For example, in clojurescript (clojure.core/reverse "a\uD834\uDD1Ec") will produce the invalid "c\uDD1E\uD834a" instead of "c\uD834\uDD1Ea". Clojure produces the correct result because the underlying Java String.reverse() keeps surrogate pairs together.
Note that clojurescript running under Rhino will produce the same (correct) result as clojure, probably because Rhino is using String.reverse() internally.
Attached patch gives clojure.string/reverse the exact same behavior in clj and cljs (including non-commutativity for strings with unmatched surrogates).
(Also, neither clojure nor clojurescript nor java reverse combining characters correctly--the combining character will "move" over a different letter. I suspect this is a WONTFIX for both clojure and clojurescript, but it is fixable with another regex replacement.)
|Comment by Francis Avila [ 12/Jun/14 1:27 AM ]|
Forget what I said about Rhino, it's just as broken there. I think I got my repls confused or something.
|Comment by David Nolen [ 02/Dec/14 5:36 AM ]|