Clojure

pr-str and read-string don't handle @ symbols inside keywords properly

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: Release 1.4
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Environment:
    Ubuntu 12.04 LTS; Java 1.7.0_05 Java HotSpot(TM) Client VM

Description

user=> (read-string (pr-str {(keyword "key@other") :stuff}))
RuntimeException Map literal must contain an even number of forms  clojure.lang.Util.runtimeException (Util.java:170)

pr-str emits "{:key@other :stuff}", which read-string fails to interpret correctly. Either pr-str needs to escape the @ symbol, or read-string needs to handle the symbol inside a keyword.

Background: I'm passing a map with email addresses as keys through Storm bolts, which require a thrift-serializable form. Using the pr-str/read-string combo fails on these keys, so I've fallen back to JSON serialization.

Activity

Hide
Stuart Halloway added a comment -

The '@' character is not a legal character for keywords or symbols (see http://clojure.org/reader). Recategorized as enhancement request.

Show
Stuart Halloway added a comment - The '@' character is not a legal character for keywords or symbols (see http://clojure.org/reader). Recategorized as enhancement request.
Stuart Halloway made changes -
Field Original Value New Value
Approval Vetted [ 10003 ]
Priority Major [ 3 ] Minor [ 4 ]
Issue Type Defect [ 1 ] Enhancement [ 4 ]
Hide
Steven Ruppert added a comment -

Then why doesn't (keyword "keywith@") throw an exception? It seems inconsistent with your statement.

Show
Steven Ruppert added a comment - Then why doesn't (keyword "keywith@") throw an exception? It seems inconsistent with your statement.
Hide
Andy Fingerhut added a comment -

It is a long standing property of Clojure that it does not throw exceptions for everything that is illegal.

Show
Andy Fingerhut added a comment - It is a long standing property of Clojure that it does not throw exceptions for everything that is illegal.
Hide
Steven Ruppert added a comment -

Yeah, but read-string clearly does. Is there a good reason that the "keyword" function can't throw an exception? With the other special rules on namespaces within symbol names, the "keyword" function really should be doing validation.

Another solution would be to allow a ruby-like :"symbol with disallowed characters" literal, but that would also be confusing with how the namespace is handled.

https://groups.google.com/forum/?fromgroups=#!topic/clojure/Ct5v9w0yNAE has some older discussion on this topic.

Show
Steven Ruppert added a comment - Yeah, but read-string clearly does. Is there a good reason that the "keyword" function can't throw an exception? With the other special rules on namespaces within symbol names, the "keyword" function really should be doing validation. Another solution would be to allow a ruby-like :"symbol with disallowed characters" literal, but that would also be confusing with how the namespace is handled. https://groups.google.com/forum/?fromgroups=#!topic/clojure/Ct5v9w0yNAE has some older discussion on this topic.
Hide
Andy Fingerhut added a comment -

Disclaimer: I'm not a Clojure/core member, just an interested contributor who doesn't know all the design decisions that were made here.

Steven, I think perhaps a couple of concerns are: (1) doing such checks would be slower than not doing them, and (2) implementing such checks means having to update them if/when the rules for legal symbols, keywords, namespace names, etc. change.

Would you be interested in writing strict versions of functions like symbol and keyword for addition to a contrib library? And test suites that try to hit a significant number of corner cases in the rules for what is legal and what is not? I mean those as serious questions, not rhetorical ones. This would permit people that want to use the strict versions of these functions to do so, and at the same time make it easy to measure performance differences between the strict and loose versions.

Show
Andy Fingerhut added a comment - Disclaimer: I'm not a Clojure/core member, just an interested contributor who doesn't know all the design decisions that were made here. Steven, I think perhaps a couple of concerns are: (1) doing such checks would be slower than not doing them, and (2) implementing such checks means having to update them if/when the rules for legal symbols, keywords, namespace names, etc. change. Would you be interested in writing strict versions of functions like symbol and keyword for addition to a contrib library? And test suites that try to hit a significant number of corner cases in the rules for what is legal and what is not? I mean those as serious questions, not rhetorical ones. This would permit people that want to use the strict versions of these functions to do so, and at the same time make it easy to measure performance differences between the strict and loose versions.
Hide
Steven Ruppert added a comment -

Looking back at this, the root cause of the problem is that the {pr} function, although it by default "print[s] in a way that objects can be read by the reader" [0], doesn't always do so. Thus, the easiest "fix" is to change its docstring to warn that not all keywords can be read back.

The deeper problem is that symbol don't have a reader form that can represent all actually possible keywords (in this case, those with "@" in them). Restricting the actually-possible keywords to match the reader form, i.e. writing a strict "keyword" function actually seems like a worse solution overall to me. The better solution would be to somehow extend the keyword reader form to allow it to express all possible keywords, possibly ruby's :"keyword" syntax. Plus, that solution would avoid having to keep hypothetical strict keyword/symbol functions in sync with reader operation, and write test cases for that, and so on.

Thus, the resolution of this bug comes down to how far we're willing to go. Changing the docstring would be the easiest, but extending the keyword form would be the "best" resolution, IMO.

[0]: http://clojuredocs.org/clojure_core/clojure.core/pr

Show
Steven Ruppert added a comment - Looking back at this, the root cause of the problem is that the {pr} function, although it by default "print[s] in a way that objects can be read by the reader" [0], doesn't always do so. Thus, the easiest "fix" is to change its docstring to warn that not all keywords can be read back. The deeper problem is that symbol don't have a reader form that can represent all actually possible keywords (in this case, those with "@" in them). Restricting the actually-possible keywords to match the reader form, i.e. writing a strict "keyword" function actually seems like a worse solution overall to me. The better solution would be to somehow extend the keyword reader form to allow it to express all possible keywords, possibly ruby's :"keyword" syntax. Plus, that solution would avoid having to keep hypothetical strict keyword/symbol functions in sync with reader operation, and write test cases for that, and so on. Thus, the resolution of this bug comes down to how far we're willing to go. Changing the docstring would be the easiest, but extending the keyword form would be the "best" resolution, IMO. [0]: http://clojuredocs.org/clojure_core/clojure.core/pr
Rich Hickey made changes -
Approval Vetted [ 10003 ]
Hide
Andy Fingerhut added a comment -

I happened across ticket CLJ-17 yesterday. Its discussion thread shows that the topic of validating the contents of constructed keywords and symbols has arisen before. At the time, a patch was written that modified the functions "symbol" and "keyword" so that the symbol/keyword was constructed as it does now, but then it was double-checked for readability using the clojure.lang.RT/readString method on the string arg given. It threw an exception if the intern and readString methods returned non-equal symbols (or if readString threw an exception).

Rich was concerned that this run-time overhead would be too high, and asked if anyone knew a faster way of doing it. Chas Emerick proposed making all symbols readable using a syntax like Common Lisp's #|symbol with whitespace|, with some checks for the common case where the quoting would be unnecessary. Rich was open to the idea of quoting arbitrary symbols, but that it would be a different ticket than that one.

I am not aware of anyone creating a ticket to introduce quoting of arbitrary symbols since then, but I could have missed it. This ticket could become that ticket, but its description would need significant editing, and code changes would be needed in a variety of places in Clojure.

Show
Andy Fingerhut added a comment - I happened across ticket CLJ-17 yesterday. Its discussion thread shows that the topic of validating the contents of constructed keywords and symbols has arisen before. At the time, a patch was written that modified the functions "symbol" and "keyword" so that the symbol/keyword was constructed as it does now, but then it was double-checked for readability using the clojure.lang.RT/readString method on the string arg given. It threw an exception if the intern and readString methods returned non-equal symbols (or if readString threw an exception). Rich was concerned that this run-time overhead would be too high, and asked if anyone knew a faster way of doing it. Chas Emerick proposed making all symbols readable using a syntax like Common Lisp's #|symbol with whitespace|, with some checks for the common case where the quoting would be unnecessary. Rich was open to the idea of quoting arbitrary symbols, but that it would be a different ticket than that one. I am not aware of anyone creating a ticket to introduce quoting of arbitrary symbols since then, but I could have missed it. This ticket could become that ticket, but its description would need significant editing, and code changes would be needed in a variety of places in Clojure.
Alex Miller made changes -
Labels reader print reader

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated: