<< Back to previous view

[CLJS-814] clojure.string/reverse breaks surrogate pairs Created: 12/Jun/14  Updated: 12/Jun/14

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Francis Avila Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: string

r2227, NOT under Rhino.

Attachments: Text File cljs-814.patch    
Patch: Code and Test


clojure.string/reverse will reverse a string by code units instead of code points which causes surrogate pairs to become reversed and unmatched (i.e., invalid utf-16).

For example, in clojurescript (clojure.core/reverse "a\uD834\uDD1Ec") will produce the invalid "c\uDD1E\uD834a" instead of "c\uD834\uDD1Ea". Clojure produces the correct result because the underlying Java String.reverse() keeps surrogate pairs together.

Note that clojurescript running under Rhino will produce the same (correct) result as clojure, probably because Rhino is using String.reverse() internally.

Attached patch gives clojure.string/reverse the exact same behavior in clj and cljs (including non-commutativity for strings with unmatched surrogates).

(Also, neither clojure nor clojurescript nor java reverse combining characters correctly--the combining character will "move" over a different letter. I suspect this is a WONTFIX for both clojure and clojurescript, but it is fixable with another regex replacement.)

Comment by Francis Avila [ 12/Jun/14 1:27 AM ]

Forget what I said about Rhino, it's just as broken there. I think I got my repls confused or something.

Generated at Thu Nov 27 04:43:12 CST 2014 using JIRA 4.4#649-r158309.