tools.reader

Reader supports poorly defined regexes that break code

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

(str #"/")   => "\/"
(str #"\/")  => "\\/"
(str #"\\/") => "\\\\/"

But what does

"\\/"
mean here?

Looking at Clojure execution of these regexes:

(re-find #"/"   "\/") => "/"
(re-find #"\/"  "\/") => "/"
(re-find #"\\/" "\/") => "\\/"

ie.

#"\/"
behaves exactly like
#"/"

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for

"\/"

Alternatively, if

 #"\/" 
remains supported (for familiarity with users used to /.../ syntax), then the reader should emit
"/"
not
"\\/"
as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399

Activity

Chris Truter made changes -
Field Original Value New Value
Description I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

(str #"/") => "\/"
(str #"\/") => "\\/"
(str #"\\/") => "\\\\/"

But what does "\\/" mean here?

Looking at Clojure execution of these regexes:

{code}
(re-find #"/" "\/") => "/"
(re-find #"\/" "\/") => "/"
(re-find #"\\/" "\/") => "\\/"
{code}

ie. #"\/" behaves exactly like #"/".

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for "\/".

Alternatively, if #"\/" remains supported (for familiarity with users used to /.../ syntax), then the reader should emit "/" not "\\/" as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399
I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

{code}
(str #"/") => "\/"
(str #"\/") => "\\/"
(str #"\\/") => "\\\\/"
{code}

But what does "\\/" mean here?

Looking at Clojure execution of these regexes:

{code}
(re-find #"/" "\/") => "/"
(re-find #"\/" "\/") => "/"
(re-find #"\\/" "\/") => "\\/"
{code}

ie. #"\/" behaves exactly like #"/".

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for "\/".

Alternatively, if #"\/" remains supported (for familiarity with users used to /.../ syntax), then the reader should emit "/" not "\\/" as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399
Nicola Mometto made changes -
Description I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

{code}
(str #"/") => "\/"
(str #"\/") => "\\/"
(str #"\\/") => "\\\\/"
{code}

But what does "\\/" mean here?

Looking at Clojure execution of these regexes:

{code}
(re-find #"/" "\/") => "/"
(re-find #"\/" "\/") => "/"
(re-find #"\\/" "\/") => "\\/"
{code}

ie. #"\/" behaves exactly like #"/".

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for "\/".

Alternatively, if #"\/" remains supported (for familiarity with users used to /.../ syntax), then the reader should emit "/" not "\\/" as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399
I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

{code}
(str #"/") => "\/"
(str #"\/") => "\\/"
(str #"\\/") => "\\\\/"
{code}

But what does {noformat}"\\/"{noformat} mean here?

Looking at Clojure execution of these regexes:

{code}
(re-find #"/" "\/") => "/"
(re-find #"\/" "\/") => "/"
(re-find #"\\/" "\/") => "\\/"
{code}

ie. {noformat}#"\/"{noformat} behaves exactly like {noformat}#"/"{noformat}

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for {noformat}"\/"{noformat}.

Alternatively, if {noformat} #"\/" {noformat} remains supported (for familiarity with users used to /.../ syntax), then the reader should emit {noformat}"/"{noformat} not {noformat}"\\/"{noformat} as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399
Nicola Mometto made changes -
Description I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

{code}
(str #"/") => "\/"
(str #"\/") => "\\/"
(str #"\\/") => "\\\\/"
{code}

But what does {noformat}"\\/"{noformat} mean here?

Looking at Clojure execution of these regexes:

{code}
(re-find #"/" "\/") => "/"
(re-find #"\/" "\/") => "/"
(re-find #"\\/" "\/") => "\\/"
{code}

ie. {noformat}#"\/"{noformat} behaves exactly like {noformat}#"/"{noformat}

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for {noformat}"\/"{noformat}.

Alternatively, if {noformat} #"\/" {noformat} remains supported (for familiarity with users used to /.../ syntax), then the reader should emit {noformat}"/"{noformat} not {noformat}"\\/"{noformat} as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399
I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

{code}
(str #"/") => "\/"
(str #"\/") => "\\/"
(str #"\\/") => "\\\\/"
{code}

But what does {noformat}"\\/"{noformat} mean here?

Looking at Clojure execution of these regexes:

{code}
(re-find #"/" "\/") => "/"
(re-find #"\/" "\/") => "/"
(re-find #"\\/" "\/") => "\\/"
{code}

ie. {noformat}#"\/"{noformat} behaves exactly like {noformat}#"/"{noformat}

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for {noformat}"\/"{noformat}

Alternatively, if {noformat} #"\/" {noformat} remains supported (for familiarity with users used to /.../ syntax), then the reader should emit {noformat}"/"{noformat} not {noformat}"\\/"{noformat} as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399
Nicola Mometto made changes -
Status Open [ 1 ] Closed [ 6 ]
Resolution Completed [ 1 ]
Nicola Mometto made changes -
Resolution Completed [ 1 ]
Status Closed [ 6 ] Reopened [ 4 ]
Nicola Mometto made changes -
Status Reopened [ 4 ] Closed [ 6 ]
Resolution Declined [ 2 ]

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: