<< Back to previous view

[CLJ-1025] Enable support for \x.. escaped characters. Created: 13/Jul/12  Updated: 19/Oct/12  Resolved: 19/Oct/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Dave Sann Assignee: Unassigned
Resolution: Declined Votes: 1
Labels: None
Environment:

All


Attachments: Text File 0001-adding-support-for-x-escape-characters.patch    
Patch: Code and Test

 Description   

see: https://groups.google.com/d/topic/clojure/Kl3WVtEE3FY/discussion

\x.. characters (which are the same as \u00.. characters) are produced by some systems. in particular clojurescript

Inability to read these characters hinders data interchange

After a quick look, I believe this capability can be easily introduced by adding a case to this
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.java#L445 function.
Mirroring 'u' case and reading only 2 chars.



 Comments   
Comment by Andy Fingerhut [ 19/Jul/12 11:46 AM ]

Thanks for the patch, Dave. It is Rich Hickey's policy only to include code in Clojure written by those who have signed a Contributor Agreement (CA). See here for more details: http://clojure.org/contributing Have you signed one, or were considering it?

Comment by Andy Fingerhut [ 19/Jul/12 3:57 PM ]

Can someone find some documentation or spec somewhere that defines this \x.. format?

It is definitely different than the \x{...} syntax that exists in Perl, which permits one to insert an arbitrary Unicode character code point into a string (note: even supplementary ones that don't fit into a single UTF-16 code unit, as Java's and Clojure's \u.... is restricted to). http://perldoc.perl.org/perlunicode.html#Effects-of-Character-Semantics

Comment by Dave Sann [ 22/Jul/12 2:19 AM ]

http://es5.github.com/x7.html#x7.8.4

Comment by Dave Sann [ 31/Jul/12 4:35 AM ]

I am happy to sign the CA in principle. Just need to read and understand any implications for me.

Comment by Dave Sann [ 27/Aug/12 3:31 AM ]

CA will be with you shortly.

Comment by Dave Sann [ 16/Oct/12 3:11 AM ]

Can this go into 1.5?

Comment by Chas Emerick [ 19/Oct/12 8:10 AM ]

I'm hitting this now as well. But, adding support for JavaScript's flavour of \x.. escapes to the Clojure reader makes no sense to me. If escapes are to be used, then the \u.... format seems preferable (it supersets \x..).

However, all of the readers in play (Clojure reader, ClojureScript reader, edn) all play nice with Unicode, so there's no reason to be escaping anything except for \t, \n, and so on.

It looks like tweaking cljs' string implementations of IPrintWithWriter and IPrintable so that only those characters are escaped would be fairly easy. Right now, they're using goog.string.escape, which "encloses a string in double quotes and escapes characters so that the string is a valid JS string"; whatever escaping is appropriate for a "valid JavaScript string" seems irrelevant to what e.g. pr-str should produce.

I propose closing this ticket and moving the party to CLJS.

Comment by Stuart Halloway [ 19/Oct/12 1:55 PM ]

Following Chas's lead and closing this one. \x doesn't appear in the JSON spec, and a quick search of StackOverflow shows people stumbling over it from a bunch of other language platforms. I think we should root it out of ClojureScript.

Comment by Chas Emerick [ 19/Oct/12 1:58 PM ]

Great, I'll open a CLJS ticket with a patch tonight or tomorrow.

Comment by Ivan Kozik [ 19/Oct/12 2:39 PM ]

Re: "no reason to be escaping anything except for \t, \n": sometimes it is difficult or impossible to transmit all of Unicode (e.g. sending non-Character codepoints through XDomainRequest, or sending U+0000/U+FFFE/U+FFFF through many XHR implementations), so it might be nice to have an ASCII-only printing mode. Probably for another ticket, though.

Comment by Chas Emerick [ 19/Oct/12 7:59 PM ]

Here's the new ticket: http://dev.clojure.org/jira/browse/CLJS-400

@Ivan: I agree that options in this area would be good. There are a lot of edge cases where the defaults aren't right (e.g. I think escaping all nonprintables is a no-brainer for readably-printed strings).

I suspect planning out such details should probably happen [here](http://dev.clojure.org/pages/viewpage.action?pageId=4063586) or [here](https://github.com/edn-format/edn/issues).

Generated at Sat Oct 25 22:03:59 CDT 2014 using JIRA 4.4#649-r158309.