ClojureScript

reader/read-string produces malformed keywords in IE9

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Environment:
    Windows 7 x86, MSIE 9, Jetty

Description

the following call: (reader/read-string "{:status :ok}") produces {"\uFFFD'status" "\uFFFD'ok"} which differs from expected {:status :ok}
the server inserts proper content-type (utf-8) header for all javascript files

the problem disappears if unicode special characters are manually replaced with their escaped equivalents ("\uFDD0") in cljs.core.keyword function in the compiled core.js file
it doesn't disappear when call to the str_STAR_ function is replaced to the concatenation operators, which suggest that the function works correctly and adds some mystery to the problem

currently I have no possibility to reproduce the problem on other system, so I'm not certain in all of the aspects

Activity

Hide
Thomas Scheiblauer added a comment - - edited

I have just attached a general non-ascii escape patch to CLJS-139 which obsoletes my previous one!

Show
Thomas Scheiblauer added a comment - - edited I have just attached a general non-ascii escape patch to CLJS-139 which obsoletes my previous one!
Hide
Thomas Scheiblauer added a comment - - edited

I have attached a patch to CLJS-139 which fixes this related issue.

Show
Thomas Scheiblauer added a comment - - edited I have attached a patch to CLJS-139 which fixes this related issue.
Hide
Thomas Scheiblauer added a comment - - edited

applying http://dev.clojure.org/jira/secure/attachment/10939/cljs-133_fix.patch to the current HEAD makes read-string work as expected. This is because David's patch for cljs-139 (http://dev.clojure.org/jira/secure/attachment/10913/139_fix_unicode_emit.patch) does not address the "emit-constant" multimethod for String (only Character, clojure.lang.Keyword and clojure.lang.Symbol). Will will have to do the same replacement for String (each character) as David did for Character (maybe by utilizing clojure.string.replace) to make the 2 functions I patched in core.cljs work in the previous unpatched state (I hope someone can understand my gibberish

!!! deleted referenced patch because it is now obsolete !!!

Show
Thomas Scheiblauer added a comment - - edited applying http://dev.clojure.org/jira/secure/attachment/10939/cljs-133_fix.patch to the current HEAD makes read-string work as expected. This is because David's patch for cljs-139 (http://dev.clojure.org/jira/secure/attachment/10913/139_fix_unicode_emit.patch) does not address the "emit-constant" multimethod for String (only Character, clojure.lang.Keyword and clojure.lang.Symbol). Will will have to do the same replacement for String (each character) as David did for Character (maybe by utilizing clojure.string.replace) to make the 2 functions I patched in core.cljs work in the previous unpatched state (I hope someone can understand my gibberish !!! deleted referenced patch because it is now obsolete !!!
Hide
David Nolen added a comment -

This ticket is different from CLJS-139, this is only about the reader.

Show
David Nolen added a comment - This ticket is different from CLJS-139, this is only about the reader.
Hide
David Nolen added a comment -

Same as CLJS-139

Show
David Nolen added a comment - Same as CLJS-139
Hide
David Nolen added a comment -

And you're sure that you're setting the utf-8 meta tag in your HTML document?

Show
David Nolen added a comment - And you're sure that you're setting the utf-8 meta tag in your HTML document?
Hide
g. christensen added a comment - - edited

The only thing I can think up is to place \uFDD0 and \uFDD1 escaped literals instead of raw characters in compiled JavaScript output or some compiler hack which will place the escaped literals in `keyword' and `symbol' construction functions.

Show
g. christensen added a comment - - edited The only thing I can think up is to place \uFDD0 and \uFDD1 escaped literals instead of raw characters in compiled JavaScript output or some compiler hack which will place the escaped literals in `keyword' and `symbol' construction functions.
Hide
David Nolen added a comment -

Having people looking into the IE issues is fantastic - this is similar to another IE9 reader issue, do you have an approach that you think will solve the problem? Thanks.

Show
David Nolen added a comment - Having people looking into the IE issues is fantastic - this is similar to another IE9 reader issue, do you have an approach that you think will solve the problem? Thanks.
Hide
g. christensen added a comment - - edited

I just have read some of unicode specifications and found: "U+FFFD � replacement character used to replace an unknown or unprintable character", so it probably necessary to find point where the noncharacter replaced with this character, or may be the raw nonescaped noncharacter is replaced internally by \uFFFD and there is no distinction between keywords and other symbols in IE, obtained through read-string (it may process files correctly but replace noncharacters in constructed strings).

Show
g. christensen added a comment - - edited I just have read some of unicode specifications and found: "U+FFFD � replacement character used to replace an unknown or unprintable character", so it probably necessary to find point where the noncharacter replaced with this character, or may be the raw nonescaped noncharacter is replaced internally by \uFFFD and there is no distinction between keywords and other symbols in IE, obtained through read-string (it may process files correctly but replace noncharacters in constructed strings).
Hide
g. christensen added a comment - - edited

Yes, I know about the internal keyword representation, the result {"\uFFFD'status" "\uFFFD'ok"} is taken from (pr-str (reader/read-string "{:status :ok}")) put in the `alert' call, in other browsers it returns {:status :ok}, but in IE it returns the string above. Comparison of such keywords with hardcoded keywords returns nil, so most likely they are not interpreted as keywords (as you may notice, the special character code in malformed keyword differs from the character hardcoded in the clojurescript code of the `keyword' function (\uFDD0 vs \uFFFD).
It's strange because it works perfectly in other browsers, and it's like that it's a some sort of endianness problem or something, but I don't know so much about IE internals and can't judje on this matter.
I have tried to reproduce the problem on other machine with IE 9.0.8112.16421 64bit update 9.0.3 and it's still there.

Show
g. christensen added a comment - - edited Yes, I know about the internal keyword representation, the result {"\uFFFD'status" "\uFFFD'ok"} is taken from (pr-str (reader/read-string "{:status :ok}")) put in the `alert' call, in other browsers it returns {:status :ok}, but in IE it returns the string above. Comparison of such keywords with hardcoded keywords returns nil, so most likely they are not interpreted as keywords (as you may notice, the special character code in malformed keyword differs from the character hardcoded in the clojurescript code of the `keyword' function (\uFDD0 vs \uFFFD). It's strange because it works perfectly in other browsers, and it's like that it's a some sort of endianness problem or something, but I don't know so much about IE internals and can't judje on this matter. I have tried to reproduce the problem on other machine with IE 9.0.8112.16421 64bit update 9.0.3 and it's still there.
Hide
David Nolen added a comment -

Keywords in ClojureScript are just JavaScript strings. If you mean that you're seeing this on the client, that is expected, are you saying that you're seeing this in the ClojureScript REPL?

Show
David Nolen added a comment - Keywords in ClojureScript are just JavaScript strings. If you mean that you're seeing this on the client, that is expected, are you saying that you're seeing this in the ClojureScript REPL?

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: