Clojure

Enable support for \x.. escaped characters.

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: Release 1.4
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Environment:
    All
  • Patch:
    Code and Test

Description

see: https://groups.google.com/d/topic/clojure/Kl3WVtEE3FY/discussion

\x.. characters (which are the same as \u00.. characters) are produced by some systems. in particular clojurescript

Inability to read these characters hinders data interchange

After a quick look, I believe this capability can be easily introduced by adding a case to this
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.java#L445 function.
Mirroring 'u' case and reading only 2 chars.

Activity

Dave Sann made changes -
Field Original Value New Value
Attachment 0001-adding-support-for-x-escape-characters.patch [ 11382 ]
Hide
Andy Fingerhut added a comment -

Thanks for the patch, Dave. It is Rich Hickey's policy only to include code in Clojure written by those who have signed a Contributor Agreement (CA). See here for more details: http://clojure.org/contributing Have you signed one, or were considering it?

Show
Andy Fingerhut added a comment - Thanks for the patch, Dave. It is Rich Hickey's policy only to include code in Clojure written by those who have signed a Contributor Agreement (CA). See here for more details: http://clojure.org/contributing Have you signed one, or were considering it?
Hide
Andy Fingerhut added a comment -

Can someone find some documentation or spec somewhere that defines this \x.. format?

It is definitely different than the \x{...} syntax that exists in Perl, which permits one to insert an arbitrary Unicode character code point into a string (note: even supplementary ones that don't fit into a single UTF-16 code unit, as Java's and Clojure's \u.... is restricted to). http://perldoc.perl.org/perlunicode.html#Effects-of-Character-Semantics

Show
Andy Fingerhut added a comment - Can someone find some documentation or spec somewhere that defines this \x.. format? It is definitely different than the \x{...} syntax that exists in Perl, which permits one to insert an arbitrary Unicode character code point into a string (note: even supplementary ones that don't fit into a single UTF-16 code unit, as Java's and Clojure's \u.... is restricted to). http://perldoc.perl.org/perlunicode.html#Effects-of-Character-Semantics
Hide
Dave Sann added a comment -
Show
Dave Sann added a comment - http://es5.github.com/x7.html#x7.8.4
Hide
Dave Sann added a comment -

I am happy to sign the CA in principle. Just need to read and understand any implications for me.

Show
Dave Sann added a comment - I am happy to sign the CA in principle. Just need to read and understand any implications for me.
Hide
Dave Sann added a comment -

CA will be with you shortly.

Show
Dave Sann added a comment - CA will be with you shortly.
Andy Fingerhut made changes -
Patch Code and Test [ 10002 ]
Hide
Dave Sann added a comment -

Can this go into 1.5?

Show
Dave Sann added a comment - Can this go into 1.5?
Hide
Chas Emerick added a comment -

I'm hitting this now as well. But, adding support for JavaScript's flavour of \x.. escapes to the Clojure reader makes no sense to me. If escapes are to be used, then the \u.... format seems preferable (it supersets \x..).

However, all of the readers in play (Clojure reader, ClojureScript reader, edn) all play nice with Unicode, so there's no reason to be escaping anything except for \t, \n, and so on.

It looks like tweaking cljs' string implementations of IPrintWithWriter and IPrintable so that only those characters are escaped would be fairly easy. Right now, they're using goog.string.escape, which "encloses a string in double quotes and escapes characters so that the string is a valid JS string"; whatever escaping is appropriate for a "valid JavaScript string" seems irrelevant to what e.g. pr-str should produce.

I propose closing this ticket and moving the party to CLJS.

Show
Chas Emerick added a comment - I'm hitting this now as well. But, adding support for JavaScript's flavour of \x.. escapes to the Clojure reader makes no sense to me. If escapes are to be used, then the \u.... format seems preferable (it supersets \x..). However, all of the readers in play (Clojure reader, ClojureScript reader, edn) all play nice with Unicode, so there's no reason to be escaping anything except for \t, \n, and so on. It looks like tweaking cljs' string implementations of IPrintWithWriter and IPrintable so that only those characters are escaped would be fairly easy. Right now, they're using goog.string.escape, which "encloses a string in double quotes and escapes characters so that the string is a valid JS string"; whatever escaping is appropriate for a "valid JavaScript string" seems irrelevant to what e.g. pr-str should produce. I propose closing this ticket and moving the party to CLJS.
Hide
Stuart Halloway added a comment -

Following Chas's lead and closing this one. \x doesn't appear in the JSON spec, and a quick search of StackOverflow shows people stumbling over it from a bunch of other language platforms. I think we should root it out of ClojureScript.

Show
Stuart Halloway added a comment - Following Chas's lead and closing this one. \x doesn't appear in the JSON spec, and a quick search of StackOverflow shows people stumbling over it from a bunch of other language platforms. I think we should root it out of ClojureScript.
Stuart Halloway made changes -
Approval Not Approved [ 10008 ]
Stuart Halloway made changes -
Resolution Declined [ 2 ]
Status Open [ 1 ] Closed [ 6 ]
Hide
Chas Emerick added a comment -

Great, I'll open a CLJS ticket with a patch tonight or tomorrow.

Show
Chas Emerick added a comment - Great, I'll open a CLJS ticket with a patch tonight or tomorrow.
Hide
Ivan Kozik added a comment -

Re: "no reason to be escaping anything except for \t, \n": sometimes it is difficult or impossible to transmit all of Unicode (e.g. sending non-Character codepoints through XDomainRequest, or sending U+0000/U+FFFE/U+FFFF through many XHR implementations), so it might be nice to have an ASCII-only printing mode. Probably for another ticket, though.

Show
Ivan Kozik added a comment - Re: "no reason to be escaping anything except for \t, \n": sometimes it is difficult or impossible to transmit all of Unicode (e.g. sending non-Character codepoints through XDomainRequest, or sending U+0000/U+FFFE/U+FFFF through many XHR implementations), so it might be nice to have an ASCII-only printing mode. Probably for another ticket, though.
Hide
Chas Emerick added a comment -

Here's the new ticket: http://dev.clojure.org/jira/browse/CLJS-400

@Ivan: I agree that options in this area would be good. There are a lot of edge cases where the defaults aren't right (e.g. I think escaping all nonprintables is a no-brainer for readably-printed strings).

I suspect planning out such details should probably happen [here](http://dev.clojure.org/pages/viewpage.action?pageId=4063586) or [here](https://github.com/edn-format/edn/issues).

Show
Chas Emerick added a comment - Here's the new ticket: http://dev.clojure.org/jira/browse/CLJS-400 @Ivan: I agree that options in this area would be good. There are a lot of edge cases where the defaults aren't right (e.g. I think escaping all nonprintables is a no-brainer for readably-printed strings). I suspect planning out such details should probably happen [here](http://dev.clojure.org/pages/viewpage.action?pageId=4063586) or [here](https://github.com/edn-format/edn/issues).

People

Vote (1)
Watch (4)

Dates

  • Created:
    Updated:
    Resolved: