Clojure

Clojure reader (incorrectly) accepts keywords starting with a number

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Screened

Description

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

user> :5
:5
user> (class *1)
clojure.lang.Keyword

Cause: The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

Solution: Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

Patch: numkeyword.patch

Screened by: Alex Miller

Activity

Alex Miller made changes -
Field Original Value New Value
Approval Triaged [ 10120 ]
Attachment numkeyword.patch [ 12234 ]
Alex Miller made changes -
Description The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: {{"[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"}}. The intention of the {{[\\D&&[^/]]}} is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive qualifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive qualifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

Alex Miller made changes -
Description The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive qualifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

Rich Hickey made changes -
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Fix Version/s Release 1.6 [ 10157 ]
Alex Miller made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Description The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

*Screened by:* Alex Miller
Hide
Andy Fingerhut added a comment -

Given how far this one is along now, perhaps CLJ-1003 should be marked as a duplicate of this one?

Show
Andy Fingerhut added a comment - Given how far this one is along now, perhaps CLJ-1003 should be marked as a duplicate of this one?
Alex Miller made changes -
Priority Major [ 3 ] Minor [ 4 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Stuart Halloway made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Closed [ 6 ]
Alex Miller made changes -
Approval Ok [ 10007 ] Screened [ 10004 ]
Status Closed [ 6 ] Reopened [ 4 ]
Resolution Completed [ 1 ]
Hide
Alex Miller added a comment -

Because this broke existing code (several lib projects like java.jdbc), we have rolled this back. Will follow up with an alternate ticket to address this in some other way.

Show
Alex Miller added a comment - Because this broke existing code (several lib projects like java.jdbc), we have rolled this back. Will follow up with an alternate ticket to address this in some other way.
Alex Miller made changes -
Resolution Declined [ 2 ]
Status Reopened [ 4 ] Closed [ 6 ]

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: