<< Back to previous view

[CLJ-1033] pr-str and read-string don't handle @ symbols inside keywords properly Created: 26/Jul/12  Updated: 03/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Steven Ruppert Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: print, reader
Environment:

Ubuntu 12.04 LTS; Java 1.7.0_05 Java HotSpot(TM) Client VM



 Description   
user=> (read-string (pr-str {(keyword "key@other") :stuff}))
RuntimeException Map literal must contain an even number of forms  clojure.lang.Util.runtimeException (Util.java:170)

pr-str emits "{:key@other :stuff}", which read-string fails to interpret correctly. Either pr-str needs to escape the @ symbol, or read-string needs to handle the symbol inside a keyword.

Background: I'm passing a map with email addresses as keys through Storm bolts, which require a thrift-serializable form. Using the pr-str/read-string combo fails on these keys, so I've fallen back to JSON serialization.



 Comments   
Comment by Stuart Halloway [ 10/Aug/12 12:51 PM ]

The '@' character is not a legal character for keywords or symbols (see http://clojure.org/reader). Recategorized as enhancement request.

Comment by Steven Ruppert [ 10/Aug/12 1:04 PM ]

Then why doesn't (keyword "keywith@") throw an exception? It seems inconsistent with your statement.

Comment by Andy Fingerhut [ 13/Sep/12 2:23 PM ]

It is a long standing property of Clojure that it does not throw exceptions for everything that is illegal.

Comment by Steven Ruppert [ 17/Sep/12 2:16 PM ]

Yeah, but read-string clearly does. Is there a good reason that the "keyword" function can't throw an exception? With the other special rules on namespaces within symbol names, the "keyword" function really should be doing validation.

Another solution would be to allow a ruby-like :"symbol with disallowed characters" literal, but that would also be confusing with how the namespace is handled.

https://groups.google.com/forum/?fromgroups=#!topic/clojure/Ct5v9w0yNAE has some older discussion on this topic.

Comment by Andy Fingerhut [ 17/Sep/12 7:43 PM ]

Disclaimer: I'm not a Clojure/core member, just an interested contributor who doesn't know all the design decisions that were made here.

Steven, I think perhaps a couple of concerns are: (1) doing such checks would be slower than not doing them, and (2) implementing such checks means having to update them if/when the rules for legal symbols, keywords, namespace names, etc. change.

Would you be interested in writing strict versions of functions like symbol and keyword for addition to a contrib library? And test suites that try to hit a significant number of corner cases in the rules for what is legal and what is not? I mean those as serious questions, not rhetorical ones. This would permit people that want to use the strict versions of these functions to do so, and at the same time make it easy to measure performance differences between the strict and loose versions.

Comment by Steven Ruppert [ 13/Jan/13 10:58 PM ]

Looking back at this, the root cause of the problem is that the {pr} function, although it by default "print[s] in a way that objects can be read by the reader" [0], doesn't always do so. Thus, the easiest "fix" is to change its docstring to warn that not all keywords can be read back.

The deeper problem is that symbol don't have a reader form that can represent all actually possible keywords (in this case, those with "@" in them). Restricting the actually-possible keywords to match the reader form, i.e. writing a strict "keyword" function actually seems like a worse solution overall to me. The better solution would be to somehow extend the keyword reader form to allow it to express all possible keywords, possibly ruby's :"keyword" syntax. Plus, that solution would avoid having to keep hypothetical strict keyword/symbol functions in sync with reader operation, and write test cases for that, and so on.

Thus, the resolution of this bug comes down to how far we're willing to go. Changing the docstring would be the easiest, but extending the keyword form would be the "best" resolution, IMO.

[0]: http://clojuredocs.org/clojure_core/clojure.core/pr

Comment by Andy Fingerhut [ 19/Aug/13 10:54 AM ]

I happened across ticket CLJ-17 yesterday. Its discussion thread shows that the topic of validating the contents of constructed keywords and symbols has arisen before. At the time, a patch was written that modified the functions "symbol" and "keyword" so that the symbol/keyword was constructed as it does now, but then it was double-checked for readability using the clojure.lang.RT/readString method on the string arg given. It threw an exception if the intern and readString methods returned non-equal symbols (or if readString threw an exception).

Rich was concerned that this run-time overhead would be too high, and asked if anyone knew a faster way of doing it. Chas Emerick proposed making all symbols readable using a syntax like Common Lisp's #|symbol with whitespace|, with some checks for the common case where the quoting would be unnecessary. Rich was open to the idea of quoting arbitrary symbols, but that it would be a different ticket than that one.

I am not aware of anyone creating a ticket to introduce quoting of arbitrary symbols since then, but I could have missed it. This ticket could become that ticket, but its description would need significant editing, and code changes would be needed in a variety of places in Clojure.

Generated at Mon Sep 01 17:11:43 CDT 2014 using JIRA 4.4#649-r158309.