<< Back to previous view

[CLJS-133] reader/read-string produces malformed keywords in IE9 Created: 20/Jan/12  Updated: 27/Jul/13  Resolved: 25/Feb/12

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: g. christensen Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: reader
Environment:

Windows 7 x86, MSIE 9, Jetty



 Description   

the following call: (reader/read-string "{:status :ok}") produces {"\uFFFD'status" "\uFFFD'ok"} which differs from expected {:status :ok}
the server inserts proper content-type (utf-8) header for all javascript files

the problem disappears if unicode special characters are manually replaced with their escaped equivalents ("\uFDD0") in cljs.core.keyword function in the compiled core.js file
it doesn't disappear when call to the str_STAR_ function is replaced to the concatenation operators, which suggest that the function works correctly and adds some mystery to the problem

currently I have no possibility to reproduce the problem on other system, so I'm not certain in all of the aspects



 Comments   
Comment by David Nolen [ 24/Jan/12 1:07 PM ]

Keywords in ClojureScript are just JavaScript strings. If you mean that you're seeing this on the client, that is expected, are you saying that you're seeing this in the ClojureScript REPL?

Comment by g. christensen [ 26/Jan/12 10:46 AM ]

Yes, I know about the internal keyword representation, the result {"\uFFFD'status" "\uFFFD'ok"} is taken from (pr-str (reader/read-string "{:status :ok}")) put in the `alert' call, in other browsers it returns {:status :ok}, but in IE it returns the string above. Comparison of such keywords with hardcoded keywords returns nil, so most likely they are not interpreted as keywords (as you may notice, the special character code in malformed keyword differs from the character hardcoded in the clojurescript code of the `keyword' function (\uFDD0 vs \uFFFD).
It's strange because it works perfectly in other browsers, and it's like that it's a some sort of endianness problem or something, but I don't know so much about IE internals and can't judje on this matter.
I have tried to reproduce the problem on other machine with IE 9.0.8112.16421 64bit update 9.0.3 and it's still there.

Comment by g. christensen [ 27/Jan/12 1:43 AM ]

I just have read some of unicode specifications and found: "U+FFFD � replacement character used to replace an unknown or unprintable character", so it probably necessary to find point where the noncharacter replaced with this character, or may be the raw nonescaped noncharacter is replaced internally by \uFFFD and there is no distinction between keywords and other symbols in IE, obtained through read-string (it may process files correctly but replace noncharacters in constructed strings).

Comment by David Nolen [ 03/Feb/12 7:17 PM ]

Having people looking into the IE issues is fantastic - this is similar to another IE9 reader issue, do you have an approach that you think will solve the problem? Thanks.

Comment by g. christensen [ 04/Feb/12 10:14 AM ]

The only thing I can think up is to place \uFDD0 and \uFDD1 escaped literals instead of raw characters in compiled JavaScript output or some compiler hack which will place the escaped literals in `keyword' and `symbol' construction functions.

Comment by David Nolen [ 05/Feb/12 12:18 PM ]

And you're sure that you're setting the utf-8 meta tag in your HTML document?

Comment by David Nolen [ 20/Feb/12 10:50 AM ]

Same as CLJS-139

Comment by David Nolen [ 22/Feb/12 8:53 AM ]

This ticket is different from CLJS-139, this is only about the reader.

Comment by Thomas Scheiblauer [ 22/Feb/12 11:09 AM ]

applying http://dev.clojure.org/jira/secure/attachment/10939/cljs-133_fix.patch to the current HEAD makes read-string work as expected. This is because David's patch for cljs-139 (http://dev.clojure.org/jira/secure/attachment/10913/139_fix_unicode_emit.patch) does not address the "emit-constant" multimethod for String (only Character, clojure.lang.Keyword and clojure.lang.Symbol). Will will have to do the same replacement for String (each character) as David did for Character (maybe by utilizing clojure.string.replace) to make the 2 functions I patched in core.cljs work in the previous unpatched state (I hope someone can understand my gibberish

!!! deleted referenced patch because it is now obsolete !!!

Comment by Thomas Scheiblauer [ 23/Feb/12 8:33 AM ]

I have attached a patch to CLJS-139 which fixes this related issue.

Comment by Thomas Scheiblauer [ 23/Feb/12 12:52 PM ]

I have just attached a general non-ascii escape patch to CLJS-139 which obsoletes my previous one!

Comment by David Nolen [ 25/Feb/12 10:25 AM ]

Fixed, https://github.com/clojure/clojurescript/commit/965dc505229652558adcb526ecb5a9f91ce31ce2

Generated at Wed Nov 26 04:13:04 CST 2014 using JIRA 4.4#649-r158309.