<< Back to previous view

[CLJS-814] clojure.string/reverse breaks surrogate pairs Created: 12/Jun/14  Updated: 02/Dec/14  Resolved: 02/Dec/14

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Francis Avila Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: string
Environment:

r2227, NOT under Rhino.


Attachments: Text File cljs-814.patch    
Patch: Code and Test

 Description   

clojure.string/reverse will reverse a string by code units instead of code points which causes surrogate pairs to become reversed and unmatched (i.e., invalid utf-16).

For example, in clojurescript (clojure.core/reverse "a\uD834\uDD1Ec") will produce the invalid "c\uDD1E\uD834a" instead of "c\uD834\uDD1Ea". Clojure produces the correct result because the underlying Java String.reverse() keeps surrogate pairs together.

Note that clojurescript running under Rhino will produce the same (correct) result as clojure, probably because Rhino is using String.reverse() internally.

Attached patch gives clojure.string/reverse the exact same behavior in clj and cljs (including non-commutativity for strings with unmatched surrogates).

(Also, neither clojure nor clojurescript nor java reverse combining characters correctly--the combining character will "move" over a different letter. I suspect this is a WONTFIX for both clojure and clojurescript, but it is fixable with another regex replacement.)



 Comments   
Comment by Francis Avila [ 12/Jun/14 1:27 AM ]

Forget what I said about Rhino, it's just as broken there. I think I got my repls confused or something.

Comment by David Nolen [ 02/Dec/14 5:36 AM ]

fixed https://github.com/clojure/clojurescript/commit/c5e96e75a3328139edfe57df1c9a191ff460cce2





[CLJS-801] str macro emits unoptimizable js code Created: 27/Apr/14  Updated: 11/May/14  Resolved: 11/May/14

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Francis Avila Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: string
Environment:

r2202


Attachments: Text File cljs-801.patch     Text File cljs-801-v2.patch     Text File cljs-801-v3.patch     Text File cljs-801-v4.patch    
Patch: Code and Test

 Description   

Clojurescript's str macro emits javascript code which is inefficient and which the closure compiler cannot optimize.

Currently it emits code like {{[cljs.core.str(arg1),cljs.core.str(arg2)].join('')}}. The problems with this:

  1. The emitted function is the arity-dispatch str wrapper instead of the 1-arg implementation of str. The closure compiler cannot eliminate the dispatch.
  2. An intermediate array is always created; the closure compiler cannot optimize it out.
  3. The closure compiler can evaluate constant string expressions (e.g. 'a'+1 to 'a1'), but cannot in this case because it cannot eliminate the str call.

The attached patch rewrites the str macro to generate js code that looks like this:

(str arg1 "constant" \space true nil 123 456.78)

(""+cljs.core.str.cljs$core$IFn$_invoke$arity$1(arg1)+"constant "+true+123+456.78)

This has a number of benefits:

  1. No short-lived array or Array.join operation.
  2. No arity dispatch is invoked. I have also observed that it can (but won't necessarily) inline the function body.
  3. The compiler can perform constant evaluation. For example, in advanced mode the compiler will emit the above example as (""+w.c(a)+"constant true123456.78") where w.c is the munged cljs.core.str.cljs$core$IFn$_invoke$arity$1


 Comments   
Comment by Francis Avila [ 28/Apr/14 12:17 AM ]

Updated patch adds booleans to the str test case, and does not eagerly stringify bools anymore. (It emits as literals and lets the closure compiler decide to stringify at compile time if it wants.)

Comment by David Nolen [ 10/May/14 2:23 PM ]

Need the patch rebased to master.

Comment by Francis Avila [ 11/May/14 1:38 AM ]

Updated patch.

Comment by David Nolen [ 11/May/14 10:52 AM ]

Sorry because of the last commit to master this ticket will not apply, please rebase again. I'll refrain from adding tests until this one gets merged in

Comment by Francis Avila [ 11/May/14 11:22 AM ]

No problem, updated patch.

Comment by David Nolen [ 11/May/14 11:27 AM ]

fixed https://github.com/clojure/clojurescript/commit/c63db9544738fc094cc04680bbee1534cf436204





Generated at Sun Dec 21 14:08:39 CST 2014 using JIRA 4.4#649-r158309.