ClojureScript

cljs.reader doesn't support keywords starting with a digit

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

ClojureScript:foo> (r/read-string ":0")
"Error evaluating:" (r/read-string ":0") :as "cljs.reader.read_string.call(null,\":0\")"
org.mozilla.javascript.EcmaError: TypeError: Cannot read property "0.0" from null (file:/home/chas/dev/clojure/cljs/.repl/cljs/reader.js#451)

The topic of leading digits in keywords came up separately, as they've been supported in Clojure for some time, but can now be considered part of the spec, as it were. See CLJ-1286.

BTW, this is another simple-check win...

Activity

David Nolen made changes -
Field Original Value New Value
Priority Critical [ 2 ] Major [ 3 ]
Hide
Chas Emerick added a comment -

This is not a simple regex change, as I had hoped given the recent flurry in Clojure. The symbol pattern in cljs.reader is faithful to Clojure HEAD, but the processing of matches isn't. I think it may be a wash as to whether it'd be easier to fix what's there vs. porting clojure.tools.reader.impl.commons/parse-symbol (which incidentally doesn't use a regex)…either way, leaving it for another day (or someone else, if they're up for it).

Show
Chas Emerick added a comment - This is not a simple regex change, as I had hoped given the recent flurry in Clojure. The symbol pattern in cljs.reader is faithful to Clojure HEAD, but the processing of matches isn't. I think it may be a wash as to whether it'd be easier to fix what's there vs. porting clojure.tools.reader.impl.commons/parse-symbol (which incidentally doesn't use a regex)…either way, leaving it for another day (or someone else, if they're up for it).
David Nolen made changes -
Priority Major [ 3 ] Minor [ 4 ]
Hide
Francis Avila added a comment -

I think I fixed the match processing issue you're talking about (CLJS-775 CLJS-776)? However I'm still confused by this and CLJ-1286. The clojure reader docs and edn spec still say they should reject `:0`, but 1.6.0 doesn't. What's the expected behavior? Is the spec going to be fixed, or clojure reader fixed once downstream packages are fixed?

Show
Francis Avila added a comment - I think I fixed the match processing issue you're talking about (CLJS-775 CLJS-776)? However I'm still confused by this and CLJ-1286. The clojure reader docs and edn spec still say they should reject `:0`, but 1.6.0 doesn't. What's the expected behavior? Is the spec going to be fixed, or clojure reader fixed once downstream packages are fixed?
Hide
Jozef Wagner added a comment -

AFAIK EDN specs do not reject :0 (no rule that the second character cannot be a digit). See https://github.com/wagjo/serialization-formats for my interpretation of existing specs.

Show
Jozef Wagner added a comment - AFAIK EDN specs do not reject :0 (no rule that the second character cannot be a digit). See https://github.com/wagjo/serialization-formats for my interpretation of existing specs.
Hide
Francis Avila added a comment - - edited

Ah, I think I see the source of the confusion. Both EDN and the clojure reader spec both say something like "keywords are like symbols, except beginning with a colon." The confusion lies in whether we interpret that as meaning

  1. First character is a colon, then the second character and after are matched against the symbol definition.
  2. The first character is a colon, and the whole form is matched against the symbol definition.

CLJ-1003 CLJ-1252 and CLJ-1286 and myself all seem to understand the first meaning. This might be because when we say "the first character of a keyword" we typically mean the first character after the colon, as if the colon is "special" and not part of the keyword (e.g. like a reader macro character).

However clojure 1.6 seems to be following the second meaning (and explains why `:0/a` is ok but not `:0/0`), and I'm not sure from the cited tickets and google group discussions whether this is because of downstream breakage or if this is the intended interpretation and the patch from CLJ-1252 was accepted by Alex Miller erroneously.

Note if we accept the second interpretation, then the restriction "A symbol can contain one or more non-repeating ':'s." from the clojure reader docs is incorrect for keywords. (EDN doesn't allow namespace-expanded keywords, it seems, so it's not an issue there.)

Also EDN allows contiguous colons in symbols, whereas clojure 1.6 and the reader spec do not.

Show
Francis Avila added a comment - - edited Ah, I think I see the source of the confusion. Both EDN and the clojure reader spec both say something like "keywords are like symbols, except beginning with a colon." The confusion lies in whether we interpret that as meaning
  1. First character is a colon, then the second character and after are matched against the symbol definition.
  2. The first character is a colon, and the whole form is matched against the symbol definition.
CLJ-1003 CLJ-1252 and CLJ-1286 and myself all seem to understand the first meaning. This might be because when we say "the first character of a keyword" we typically mean the first character after the colon, as if the colon is "special" and not part of the keyword (e.g. like a reader macro character). However clojure 1.6 seems to be following the second meaning (and explains why `:0/a` is ok but not `:0/0`), and I'm not sure from the cited tickets and google group discussions whether this is because of downstream breakage or if this is the intended interpretation and the patch from CLJ-1252 was accepted by Alex Miller erroneously. Note if we accept the second interpretation, then the restriction "A symbol can contain one or more non-repeating ':'s." from the clojure reader docs is incorrect for keywords. (EDN doesn't allow namespace-expanded keywords, it seems, so it's not an issue there.) Also EDN allows contiguous colons in symbols, whereas clojure 1.6 and the reader spec do not.
Hide
Francis Avila added a comment - - edited

Also clojure 1.6 allows a/:a and :a/:a (where name part violates first-character rule for symbols), even though the specs do not. (This is something your table doesn't mention. Very thorough work BTW! I wish the reader spec was more formalized and unambiguous...)

Show
Francis Avila added a comment - - edited Also clojure 1.6 allows a/:a and :a/:a (where name part violates first-character rule for symbols), even though the specs do not. (This is something your table doesn't mention. Very thorough work BTW! I wish the reader spec was more formalized and unambiguous...)
Hide
Francis Avila added a comment - - edited

I think this pattern follows the specs:

#"(?x)
(?!///) # edge case: / only allowed in name part.
# name or namespace part of symbol or keyword
(?:
 #division symbol
 (/
 # normal symbol
 |[a-zA-Z*!_?$%&=<>][0-9a-zA-Z*!_?$%&=<>\#:+.-]*
 # symbol starting with [-+.]
 |[-+.](?:[a-zA-Z*!_?$%&=<>\#:+.-][0-9a-zA-Z*!_?$%&=<>\#:+.-]*)?)
 # keyword
 |(::?)([0-9a-zA-Z*!_?$%&=<>\#:+.-]+))
# name part when namespace is present
(?:/(/ # division symbol
    |[a-zA-Z*!_?$%&=<>][0-9a-zA-Z*!_?$%&=<>\#:+.-]*
    |[-+.](?:[a-zA-Z*!_?$%&=<>\#:+.-][0-9a-zA-Z*!_?$%&=<>\#:+.-]*)?))?
# groups:
# 1: symbol name or namespace 2: keyword colon(s) 3: keyword name or namespace
# 4: keyword or symbol name (and groups 1 and 3 are namespaces)"

Problems:

  1. Does not enforce no-repeating-colon rule (but it is easy to validate after matching).
  2. Rejects violations of first-character-rule in symbols which clojure accepts.
  3. Accepts a trailing colon on namespace (unlike clojure).
  4. Accepts foo// or :foo//, which are not clearly addressed by the specs. (Jozef's table has more background). These are both allowed in Clojure 1.6, but not 1.5 or (arguably) edn.
Show
Francis Avila added a comment - - edited I think this pattern follows the specs:
#"(?x)
(?!///) # edge case: / only allowed in name part.
# name or namespace part of symbol or keyword
(?:
 #division symbol
 (/
 # normal symbol
 |[a-zA-Z*!_?$%&=<>][0-9a-zA-Z*!_?$%&=<>\#:+.-]*
 # symbol starting with [-+.]
 |[-+.](?:[a-zA-Z*!_?$%&=<>\#:+.-][0-9a-zA-Z*!_?$%&=<>\#:+.-]*)?)
 # keyword
 |(::?)([0-9a-zA-Z*!_?$%&=<>\#:+.-]+))
# name part when namespace is present
(?:/(/ # division symbol
    |[a-zA-Z*!_?$%&=<>][0-9a-zA-Z*!_?$%&=<>\#:+.-]*
    |[-+.](?:[a-zA-Z*!_?$%&=<>\#:+.-][0-9a-zA-Z*!_?$%&=<>\#:+.-]*)?))?
# groups:
# 1: symbol name or namespace 2: keyword colon(s) 3: keyword name or namespace
# 4: keyword or symbol name (and groups 1 and 3 are namespaces)"
Problems:
  1. Does not enforce no-repeating-colon rule (but it is easy to validate after matching).
  2. Rejects violations of first-character-rule in symbols which clojure accepts.
  3. Accepts a trailing colon on namespace (unlike clojure).
  4. Accepts foo// or :foo//, which are not clearly addressed by the specs. (Jozef's table has more background). These are both allowed in Clojure 1.6, but not 1.5 or (arguably) edn.
Hide
Francis Avila added a comment - - edited

Another problem: Accepts :::a/b, which I think is ok per the specs but is not read by 1.6. Crazy example:

user=> (require ['clojure.core :as (symbol ":a")])
nil
user=> :::a/map

RuntimeException Invalid token: :::a/map  clojure.lang.Util.runtimeException (Util.java:221)
user=> (resolve (symbol ":a" "map"))
#'clojure.core/map

Theoretically I might expect :::a/map to be read as :clojure.core/map?

Show
Francis Avila added a comment - - edited Another problem: Accepts :::a/b, which I think is ok per the specs but is not read by 1.6. Crazy example:
user=> (require ['clojure.core :as (symbol ":a")])
nil
user=> :::a/map

RuntimeException Invalid token: :::a/map  clojure.lang.Util.runtimeException (Util.java:221)
user=> (resolve (symbol ":a" "map"))
#'clojure.core/map
Theoretically I might expect :::a/map to be read as :clojure.core/map?

People

Vote (1)
Watch (3)

Dates

  • Created:
    Updated: