clojure.string/reverse breaks surrogate pairs


  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Environment:
    r2227, NOT under Rhino.
  • Patch:
    Code and Test


clojure.string/reverse will reverse a string by code units instead of code points which causes surrogate pairs to become reversed and unmatched (i.e., invalid utf-16).

For example, in clojurescript (clojure.core/reverse "a\uD834\uDD1Ec") will produce the invalid "c\uDD1E\uD834a" instead of "c\uD834\uDD1Ea". Clojure produces the correct result because the underlying Java String.reverse() keeps surrogate pairs together.

Note that clojurescript running under Rhino will produce the same (correct) result as clojure, probably because Rhino is using String.reverse() internally.

Attached patch gives clojure.string/reverse the exact same behavior in clj and cljs (including non-commutativity for strings with unmatched surrogates).

(Also, neither clojure nor clojurescript nor java reverse combining characters correctly--the combining character will "move" over a different letter. I suspect this is a WONTFIX for both clojure and clojurescript, but it is fixable with another regex replacement.)



Vote (0)
Watch (0)


  • Created: