Clojure

Clojure reader (incorrectly) accepts keywords starting with a number

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Screened

Description

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

user> :5
:5
user> (class *1)
clojure.lang.Keyword

Cause: The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

Solution: Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

Patch: numkeyword.patch

Screened by: Alex Miller

Activity

Alex Miller made changes -
Field Original Value New Value
Approval Triaged [ 10120 ]
Attachment numkeyword.patch [ 12234 ]
Alex Miller made changes -
Description The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: {{"[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"}}. The intention of the {{[\\D&&[^/]]}} is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive qualifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive qualifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

Alex Miller made changes -
Description The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive qualifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

Rich Hickey made changes -
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Fix Version/s Release 1.6 [ 10157 ]
Alex Miller made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Description The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

The reader page at http://clojure.org/reader states that symbols (and keywords) cannot start with a number and the regex used in LispReader (and EdnReader) also has this intention. But:

{code}
user> :5
:5
user> (class *1)
clojure.lang.Keyword
{code}

*Cause:* The pattern for keywords is: "[:]?([\D&&[^/]].*/)?(/|[\D&&[^/]][^/]*)". The intention of the [\D&&[^/]] is to accept all non-digits except /. However, if the : matches and 5 does not, the regex will backtrack and unmatch the :, instead matching it in the non-digit charset. The whole match is sent on in the code and the : is stripped later.

*Solution:* Prevent back-tracking for the first : match by using the possessive quantifier ?+ instead of ?. Once the first : is matched, it will not backtrack to it. (This is also faster.) The patch makes this change in LispReader and EdnReader, updates a couple tests that were using number keywords, and adds a new negative test to check that :5 isn't read.

*Patch:* numkeyword.patch

*Screened by:* Alex Miller
Alex Miller made changes -
Priority Major [ 3 ] Minor [ 4 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Stuart Halloway made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Closed [ 6 ]
Alex Miller made changes -
Approval Ok [ 10007 ] Screened [ 10004 ]
Status Closed [ 6 ] Reopened [ 4 ]
Resolution Completed [ 1 ]
Alex Miller made changes -
Resolution Declined [ 2 ]
Status Reopened [ 4 ] Closed [ 6 ]

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: