<< Back to previous view

[CLJS-139] Internet Explorer treats \uFDD0 up to \uFDEF as equal so some keyword and symbol related things do not work Created: 31/Jan/12  Updated: 27/Jul/13  Resolved: 25/Feb/12

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Thomas Scheiblauer Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: None
Environment:

Clojurescript HEAD, tested ond IE7 and IE8


Attachments: Text File 139_fix_unicode_emit.patch     Text File cljs-139+133_symbol+keyword-fix.patch     Text File CLJS_139.patch     Text File general-escaping-emit-constant.patch     Zip Archive unicodetest.zip    

 Description   

Internet Explorer treats \uFDD0 up to \uFDEF as equal. e.g. '(= \uFDD0 \uFDD1)' returns true.
Therefore e.g. '(symbol? :whatever)', '(symbol? (keyword "whatever"))' and so on return true which obviously shouldn't happen.
Further on read-string does not correctly unmarshal keywords because of that issue.
Using pr-str on that unicode range returns ":" for all codes.
All other non IE Browsers I've tested (Firefox, Chrome/Chromium, Opera, iOS BRowser, etc...) do not expose this problem and work as expected.

Some context: http://stackoverflow.com/questions/5188679/whats-the-purpose-of-the-noncharacters-ufdd0-to-ufdef



 Comments   
Comment by David Nolen [ 31/Jan/12 10:24 AM ]

Do you have a proposed solution?

Comment by Thomas Scheiblauer [ 01/Feb/12 3:17 AM ]

Maybe change the keyword and symbol identifiers to \uFDF0 and \uFDF1 (or any other cross browser accepted value) respectively?
I will try that and report back any success or failure.

Comment by Thomas Scheiblauer [ 01/Feb/12 5:37 AM ]

My proposed solution, replacing \uFDD0 and \uFDD1 with \uFDF0 and \uFDF1 respectively works on all tested Browsers (Firefox9, Chromium17, iPAD/iOS-5.0.1, Epiphany AND IE7 + IE8).
I used this bash command line inside the "src" folder to do the replacement (after checking that the replacements will be appropriate):
for src in $(grep -R -i -l 'fdd' .); do sed -i -e 's/FDD/FDF/g' $src; done

Comment by David Nolen [ 01/Feb/12 10:53 AM ]

Thanks, will look into this.

Comment by David Nolen [ 03/Feb/12 7:14 PM ]

Please apply this patch and confirm that it works for you. Thanks!

Comment by Thomas Scheiblauer [ 06/Feb/12 8:49 AM ]

It works, thank you!

Comment by David Powell [ 06/Feb/12 12:04 PM ]

Can anyone confirm this problem?

In IE8, in Javascript:

('\ufdd0' === '\ufdd1')

returns false.

Comment by David Nolen [ 06/Feb/12 5:08 PM ]

David Powell, so you're not seeing the same issue?

Comment by Thomas Scheiblauer [ 07/Feb/12 8:49 AM ]

I just did another test where I compared the two values using Javascript directly and one using compiled Clojurescript with only an alert outputting the result of the same test. These two tests produced the correct results (as David Powell observed) even in the IEs. So it seems that there happens something in the context of my application, maybe it's caused by one of the included js libs (I'm also using jquery, jquery.mobile and ckeditor... though I've already included these in the testing environment, uhmmm...). I will have to investigate that further. I'll post another comment when I've found out something.

Comment by Thomas Scheiblauer [ 08/Feb/12 4:29 AM ]

Intermediate result of my investigations:
It only happens when I turn Clojurescript optimizations OFF!!!
As soon as I activate any compiler optimizations even if it's only set to "simple", the problem will not manifest itself on IE.
It also seems not to depend on any of my included third party libs.

Comment by David Nolen [ 08/Feb/12 7:59 AM ]

Are you using your own code or one of the examples that ships with ClojureScript?

Comment by Thomas Scheiblauer [ 08/Feb/12 11:50 AM ]

I'm using my own code.

I attached a test that illustrates the described problem with IE (at least on IE7 and IE8, I still couldn't test more recent IE versions).
The result of the equations should always be "false" which isn't the case with non optimized, compiled ClojureScript within the unicode range \uFDD0 up to \uFDEF while it works e.g. for \uFDF0 and \uFDF1.
"unicodetest.html" uses the the non optimized code which exposes the bug; it was compiled using: cljsc unicodetest.cljs > unicodetest.js
"unicodetest_simpleopt.html" uses the "simple" optimized code which does not expose the bug; it was compiled using: cljsc unicodetest.cljs '{:output-dir "simpleopt_out" :optimizations :simple}' > unicodetest_simpleopt.js
Using advanced optimization also produces correct results even on IE.

The code was compiled using the following ClojureScript version and dependencies:
ClojureScript from git (commit 1028ca12f169322e31d7967d8c385165cf773665)
Google closure library r1487
Google closure compiler r1592
guava 10.0.1
rhino 1.7R3

running on:
Gentoo Linux x86_64, Linux kernel 3.2.5
Java SE 1.6.0_30
Clojure 1.3.0

The problem was observed on:
Windows XP (SP 2)
using IE7 and IE8
(it did not occur on any other non-IE browser I've tested: Firefox, Chromium, iPad Browser, Epiphany and Opera regardless of the OS they were running on)

Comment by Thomas Scheiblauer [ 08/Feb/12 4:18 PM ]

Looking at the compiled code I see that in the version without optimizations the Unicode characters are converted from the UTF-16 \u notation to their UTF-8 byte values (3 bytes); look here: http://www.fileformat.info/info/unicode/char/fdd0/index.htm
... while in the optimized version the \u notation stays intact.
That of course does not explain why IE treats strings containing the byte sequences 0xEFB790 up to 0xEFB7AF as equal but it explains why there is a difference between the optimized and non optimized compiler output and also why it works in IE when coding the equality test directly in Javascript using the \u notation.

Comment by David Nolen [ 15/Feb/12 4:42 PM ]

I suspect this because of improperly escaped Java strings - looking into it.

Comment by David Nolen [ 16/Feb/12 9:19 AM ]

This one avoids emitting unicode chars, please test

Comment by Thomas Scheiblauer [ 22/Feb/12 8:27 AM ]

I've just tested your new fix which prevents the compiler from converting unicode literals to their respective UTF-8 representations but now I encounter the bug when using read-string as described here: CLJS-133
Again only on IE and only when not using any optimization during compilation.

Comment by Thomas Scheiblauer [ 23/Feb/12 8:22 AM ]

I have attached a new patch which should fix all problems related to this issue (including CLJS-133 which is related) so far (works for me). It includes David's partial fix so only this one is required.
However, to prevent possible related future issues it should probably be considered to escape every non-ascii character in emitted strings and characters because IE obviously HAS severe problems with handling unescaped unicode.

Comment by David Nolen [ 23/Feb/12 10:44 AM ]

This seems less ideal than just solving this once and for all. I'm assuming many people will want to emit unicode chars and want them to emit properly and work under development in IE. When we emit strings we should rebuild the string replacing unicode chars. This should be done efficiently as possible.

Comment by Thomas Scheiblauer [ 23/Feb/12 11:07 AM ]

Sure, that's what I wanted to express with my last sentence. I'm currently exploring possible solutions to find the most efficient one.

Comment by Thomas Scheiblauer [ 23/Feb/12 12:48 PM ]

I have just attached my first stab at a general non-ascii escape patch for emit-constant. It should already be pretty efficient. I basically took the core from Stuart Sierra's clojure/data.json write-json-string function (to not having to reinvent the wheel), optimized it a bit (hopefully) and integrated it properly into compiler.cljs.

Comment by Thomas Scheiblauer [ 24/Feb/12 4:11 AM ]

Just to nip questions regarding the emittance of regular expressions in the bud, I've just tested this and it behaves correctly without "manual" intervention. More precisely, escaped unicode in regular expressions does apparently not get converted to utf-8 (or whatever character set) when applying ".toString" or "str" to them.

Comment by David Nolen [ 24/Feb/12 9:24 PM ]

This patch is good but could you please create a patch with git that includes proper attribution information. This is a good guide - http://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git

Comment by Thomas Scheiblauer [ 25/Feb/12 2:54 AM ]

Ok, I have replaced the patch with one created according to the instructions in the guide you pointed me at.

Comment by David Nolen [ 25/Feb/12 10:25 AM ]

Fixed, https://github.com/clojure/clojurescript/commit/965dc505229652558adcb526ecb5a9f91ce31ce2

Generated at Thu Aug 16 09:05:47 CDT 2018 using JIRA 4.4#649-r158309.