ClojureScript

Internet Explorer treats \uFDD0 up to \uFDEF as equal so some keyword and symbol related things do not work

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Environment:
    Clojurescript HEAD, tested ond IE7 and IE8

Description

Internet Explorer treats \uFDD0 up to \uFDEF as equal. e.g. '(= \uFDD0 \uFDD1)' returns true.
Therefore e.g. '(symbol? :whatever)', '(symbol? (keyword "whatever"))' and so on return true which obviously shouldn't happen.
Further on read-string does not correctly unmarshal keywords because of that issue.
Using pr-str on that unicode range returns ":" for all codes.
All other non IE Browsers I've tested (Firefox, Chrome/Chromium, Opera, iOS BRowser, etc...) do not expose this problem and work as expected.

Some context: http://stackoverflow.com/questions/5188679/whats-the-purpose-of-the-noncharacters-ufdd0-to-ufdef

Activity

Hide
David Nolen added a comment -

Do you have a proposed solution?

Show
David Nolen added a comment - Do you have a proposed solution?
Hide
Thomas Scheiblauer added a comment -

Maybe change the keyword and symbol identifiers to \uFDF0 and \uFDF1 (or any other cross browser accepted value) respectively?
I will try that and report back any success or failure.

Show
Thomas Scheiblauer added a comment - Maybe change the keyword and symbol identifiers to \uFDF0 and \uFDF1 (or any other cross browser accepted value) respectively? I will try that and report back any success or failure.
Hide
Thomas Scheiblauer added a comment -

My proposed solution, replacing \uFDD0 and \uFDD1 with \uFDF0 and \uFDF1 respectively works on all tested Browsers (Firefox9, Chromium17, iPAD/iOS-5.0.1, Epiphany AND IE7 + IE8).
I used this bash command line inside the "src" folder to do the replacement (after checking that the replacements will be appropriate):
for src in $(grep -R -i -l 'fdd' .); do sed -i -e 's/FDD/FDF/g' $src; done

Show
Thomas Scheiblauer added a comment - My proposed solution, replacing \uFDD0 and \uFDD1 with \uFDF0 and \uFDF1 respectively works on all tested Browsers (Firefox9, Chromium17, iPAD/iOS-5.0.1, Epiphany AND IE7 + IE8). I used this bash command line inside the "src" folder to do the replacement (after checking that the replacements will be appropriate): for src in $(grep -R -i -l 'fdd' .); do sed -i -e 's/FDD/FDF/g' $src; done
Hide
David Nolen added a comment -

Thanks, will look into this.

Show
David Nolen added a comment - Thanks, will look into this.
Hide
David Nolen added a comment -

Please apply this patch and confirm that it works for you. Thanks!

Show
David Nolen added a comment - Please apply this patch and confirm that it works for you. Thanks!
David Nolen made changes -
Field Original Value New Value
Attachment CLJS_139.patch [ 10800 ]
David Nolen made changes -
Assignee David Nolen [ dnolen ]
Hide
Thomas Scheiblauer added a comment -

It works, thank you!

Show
Thomas Scheiblauer added a comment - It works, thank you!
Hide
David Powell added a comment -

Can anyone confirm this problem?

In IE8, in Javascript:

('\ufdd0' === '\ufdd1')

returns false.

Show
David Powell added a comment - Can anyone confirm this problem? In IE8, in Javascript: ('\ufdd0' === '\ufdd1') returns false.
Hide
David Nolen added a comment - - edited

David Powell, so you're not seeing the same issue?

Show
David Nolen added a comment - - edited David Powell, so you're not seeing the same issue?
Hide
Thomas Scheiblauer added a comment -

I just did another test where I compared the two values using Javascript directly and one using compiled Clojurescript with only an alert outputting the result of the same test. These two tests produced the correct results (as David Powell observed) even in the IEs. So it seems that there happens something in the context of my application, maybe it's caused by one of the included js libs (I'm also using jquery, jquery.mobile and ckeditor... though I've already included these in the testing environment, uhmmm...). I will have to investigate that further. I'll post another comment when I've found out something.

Show
Thomas Scheiblauer added a comment - I just did another test where I compared the two values using Javascript directly and one using compiled Clojurescript with only an alert outputting the result of the same test. These two tests produced the correct results (as David Powell observed) even in the IEs. So it seems that there happens something in the context of my application, maybe it's caused by one of the included js libs (I'm also using jquery, jquery.mobile and ckeditor... though I've already included these in the testing environment, uhmmm...). I will have to investigate that further. I'll post another comment when I've found out something.
Hide
Thomas Scheiblauer added a comment -

Intermediate result of my investigations:
It only happens when I turn Clojurescript optimizations OFF!!!
As soon as I activate any compiler optimizations even if it's only set to "simple", the problem will not manifest itself on IE.
It also seems not to depend on any of my included third party libs.

Show
Thomas Scheiblauer added a comment - Intermediate result of my investigations: It only happens when I turn Clojurescript optimizations OFF!!! As soon as I activate any compiler optimizations even if it's only set to "simple", the problem will not manifest itself on IE. It also seems not to depend on any of my included third party libs.
Hide
David Nolen added a comment -

Are you using your own code or one of the examples that ships with ClojureScript?

Show
David Nolen added a comment - Are you using your own code or one of the examples that ships with ClojureScript?
Hide
Thomas Scheiblauer added a comment - - edited

I'm using my own code.

I attached a test that illustrates the described problem with IE (at least on IE7 and IE8, I still couldn't test more recent IE versions).
The result of the equations should always be "false" which isn't the case with non optimized, compiled ClojureScript within the unicode range \uFDD0 up to \uFDEF while it works e.g. for \uFDF0 and \uFDF1.
"unicodetest.html" uses the the non optimized code which exposes the bug; it was compiled using: cljsc unicodetest.cljs > unicodetest.js
"unicodetest_simpleopt.html" uses the "simple" optimized code which does not expose the bug; it was compiled using: cljsc unicodetest.cljs '{:output-dir "simpleopt_out" :optimizations :simple}' > unicodetest_simpleopt.js
Using advanced optimization also produces correct results even on IE.

The code was compiled using the following ClojureScript version and dependencies:
ClojureScript from git (commit 1028ca12f169322e31d7967d8c385165cf773665)
Google closure library r1487
Google closure compiler r1592
guava 10.0.1
rhino 1.7R3

running on:
Gentoo Linux x86_64, Linux kernel 3.2.5
Java SE 1.6.0_30
Clojure 1.3.0

The problem was observed on:
Windows XP (SP 2)
using IE7 and IE8
(it did not occur on any other non-IE browser I've tested: Firefox, Chromium, iPad Browser, Epiphany and Opera regardless of the OS they were running on)

Show
Thomas Scheiblauer added a comment - - edited I'm using my own code. I attached a test that illustrates the described problem with IE (at least on IE7 and IE8, I still couldn't test more recent IE versions). The result of the equations should always be "false" which isn't the case with non optimized, compiled ClojureScript within the unicode range \uFDD0 up to \uFDEF while it works e.g. for \uFDF0 and \uFDF1. "unicodetest.html" uses the the non optimized code which exposes the bug; it was compiled using: cljsc unicodetest.cljs > unicodetest.js "unicodetest_simpleopt.html" uses the "simple" optimized code which does not expose the bug; it was compiled using: cljsc unicodetest.cljs '{:output-dir "simpleopt_out" :optimizations :simple}' > unicodetest_simpleopt.js Using advanced optimization also produces correct results even on IE. The code was compiled using the following ClojureScript version and dependencies: ClojureScript from git (commit 1028ca12f169322e31d7967d8c385165cf773665) Google closure library r1487 Google closure compiler r1592 guava 10.0.1 rhino 1.7R3 running on: Gentoo Linux x86_64, Linux kernel 3.2.5 Java SE 1.6.0_30 Clojure 1.3.0 The problem was observed on: Windows XP (SP 2) using IE7 and IE8 (it did not occur on any other non-IE browser I've tested: Firefox, Chromium, iPad Browser, Epiphany and Opera regardless of the OS they were running on)
Thomas Scheiblauer made changes -
Attachment unicodetest.zip [ 10898 ]
Hide
Thomas Scheiblauer added a comment - - edited

Looking at the compiled code I see that in the version without optimizations the Unicode characters are converted from the UTF-16 \u notation to their UTF-8 byte values (3 bytes); look here: http://www.fileformat.info/info/unicode/char/fdd0/index.htm
... while in the optimized version the \u notation stays intact.
That of course does not explain why IE treats strings containing the byte sequences 0xEFB790 up to 0xEFB7AF as equal but it explains why there is a difference between the optimized and non optimized compiler output and also why it works in IE when coding the equality test directly in Javascript using the \u notation.

Show
Thomas Scheiblauer added a comment - - edited Looking at the compiled code I see that in the version without optimizations the Unicode characters are converted from the UTF-16 \u notation to their UTF-8 byte values (3 bytes); look here: http://www.fileformat.info/info/unicode/char/fdd0/index.htm ... while in the optimized version the \u notation stays intact. That of course does not explain why IE treats strings containing the byte sequences 0xEFB790 up to 0xEFB7AF as equal but it explains why there is a difference between the optimized and non optimized compiler output and also why it works in IE when coding the equality test directly in Javascript using the \u notation.
David Nolen made changes -
Description Internet Explorer treats \uFDD0 up to \uFDEF as equal. e.g. '(= \uFDD0 \uFDD1)' returns true.
Therefore e.g. '(symbol? :whatever)', '(symbol? (keyword "whatever"))' and so on return true which obviously shouldn't happen.
Further on read-string does not correctly unmarshal keywords because of that issue.
Using pr-str on that unicode range returns ":" for all codes.
All other non IE Browsers I've tested (Firefox, Chrome/Chromium, Opera, iOS BRowser, etc...) do not expose this problem and work as expected.
Internet Explorer treats \uFDD0 up to \uFDEF as equal. e.g. '(= \uFDD0 \uFDD1)' returns true.
Therefore e.g. '(symbol? :whatever)', '(symbol? (keyword "whatever"))' and so on return true which obviously shouldn't happen.
Further on read-string does not correctly unmarshal keywords because of that issue.
Using pr-str on that unicode range returns ":" for all codes.
All other non IE Browsers I've tested (Firefox, Chrome/Chromium, Opera, iOS BRowser, etc...) do not expose this problem and work as expected.

Some context: http://stackoverflow.com/questions/5188679/whats-the-purpose-of-the-noncharacters-ufdd0-to-ufdef
Hide
David Nolen added a comment -

I suspect this because of improperly escaped Java strings - looking into it.

Show
David Nolen added a comment - I suspect this because of improperly escaped Java strings - looking into it.
Hide
David Nolen added a comment -

This one avoids emitting unicode chars, please test

Show
David Nolen added a comment - This one avoids emitting unicode chars, please test
David Nolen made changes -
Attachment 139_fix_unicode_emit.patch [ 10913 ]
Hide
Thomas Scheiblauer added a comment - - edited

I've just tested your new fix which prevents the compiler from converting unicode literals to their respective UTF-8 representations but now I encounter the bug when using read-string as described here: CLJS-133
Again only on IE and only when not using any optimization during compilation.

Show
Thomas Scheiblauer added a comment - - edited I've just tested your new fix which prevents the compiler from converting unicode literals to their respective UTF-8 representations but now I encounter the bug when using read-string as described here: CLJS-133 Again only on IE and only when not using any optimization during compilation.
Hide
Thomas Scheiblauer added a comment - - edited

I have attached a new patch which should fix all problems related to this issue (including CLJS-133 which is related) so far (works for me). It includes David's partial fix so only this one is required.
However, to prevent possible related future issues it should probably be considered to escape every non-ascii character in emitted strings and characters because IE obviously HAS severe problems with handling unescaped unicode.

Show
Thomas Scheiblauer added a comment - - edited I have attached a new patch which should fix all problems related to this issue (including CLJS-133 which is related) so far (works for me). It includes David's partial fix so only this one is required. However, to prevent possible related future issues it should probably be considered to escape every non-ascii character in emitted strings and characters because IE obviously HAS severe problems with handling unescaped unicode.
Thomas Scheiblauer made changes -
Hide
David Nolen added a comment -

This seems less ideal than just solving this once and for all. I'm assuming many people will want to emit unicode chars and want them to emit properly and work under development in IE. When we emit strings we should rebuild the string replacing unicode chars. This should be done efficiently as possible.

Show
David Nolen added a comment - This seems less ideal than just solving this once and for all. I'm assuming many people will want to emit unicode chars and want them to emit properly and work under development in IE. When we emit strings we should rebuild the string replacing unicode chars. This should be done efficiently as possible.
Hide
Thomas Scheiblauer added a comment -

Sure, that's what I wanted to express with my last sentence. I'm currently exploring possible solutions to find the most efficient one.

Show
Thomas Scheiblauer added a comment - Sure, that's what I wanted to express with my last sentence. I'm currently exploring possible solutions to find the most efficient one.
Hide
Thomas Scheiblauer added a comment - - edited

I have just attached my first stab at a general non-ascii escape patch for emit-constant. It should already be pretty efficient. I basically took the core from Stuart Sierra's clojure/data.json write-json-string function (to not having to reinvent the wheel), optimized it a bit (hopefully) and integrated it properly into compiler.cljs.

Show
Thomas Scheiblauer added a comment - - edited I have just attached my first stab at a general non-ascii escape patch for emit-constant. It should already be pretty efficient. I basically took the core from Stuart Sierra's clojure/data.json write-json-string function (to not having to reinvent the wheel), optimized it a bit (hopefully) and integrated it properly into compiler.cljs.
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10947 ]
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10947 ]
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10948 ]
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10948 ]
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10949 ]
Hide
Thomas Scheiblauer added a comment - - edited

Just to nip questions regarding the emittance of regular expressions in the bud, I've just tested this and it behaves correctly without "manual" intervention. More precisely, escaped unicode in regular expressions does apparently not get converted to utf-8 (or whatever character set) when applying ".toString" or "str" to them.

Show
Thomas Scheiblauer added a comment - - edited Just to nip questions regarding the emittance of regular expressions in the bud, I've just tested this and it behaves correctly without "manual" intervention. More precisely, escaped unicode in regular expressions does apparently not get converted to utf-8 (or whatever character set) when applying ".toString" or "str" to them.
Hide
David Nolen added a comment -

This patch is good but could you please create a patch with git that includes proper attribution information. This is a good guide - http://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git

Show
David Nolen added a comment - This patch is good but could you please create a patch with git that includes proper attribution information. This is a good guide - http://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10949 ]
Hide
Thomas Scheiblauer added a comment - - edited

Ok, I have replaced the patch with one created according to the instructions in the guide you pointed me at.

Show
Thomas Scheiblauer added a comment - - edited Ok, I have replaced the patch with one created according to the instructions in the guide you pointed me at.
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10966 ]
Thomas Scheiblauer made changes -
Attachment general-escaping-emit-constant.patch [ 10966 ]
Thomas Scheiblauer made changes -
David Nolen made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]
David Nolen made changes -
Status Resolved [ 5 ] Closed [ 6 ]

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: