<< Back to previous view

[CLJS-776] re-matches is incorrect Created: 28/Feb/14  Updated: 24/Jun/14

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Francis Avila Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


The re-matches function does not have the correct semantics: it performs a search (not match) against the string and returns nil if the string and matched-string are unequal. This is not the same as true matching, which is like inserting "^" and "$" at the beginning and end of the pattern.

Example in Clojure:

user=> (re-find #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1")
user=> (re-matches #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1")

Compare Clojurescript:

ClojureScript:cljs.user> (re-find  #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1")
ClojureScript:cljs.user> (re-matches #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1")

This bug is (one of the) reasons why CLJS-775.

I'm not completely sure what to do here. My first thought is to have re-matches inspect the -source property of its regex input, wrap the string with "^$", then carefully copy all flags over to a new regexp.


  1. Are there any valid patterns where this is not safe? E.g., where we could not put ^ first? Is "^^abc$$" ok?
  2. Can we avoid cloning if ^ and $ are already the first and last chars of the pattern?
  3. How does multiline mode play in to this, if at all?
  4. regexinstance.lastIndex is a piece of mutability on regex instances (or the RegExp global on older browsers) which is used as a string offset for multiple invocations of exec() on the same string. I have no idea what to do if re-* gets a regex with the global flag set. (BTW, this is a very good reason to reject CLJS-150: allowing clojure to accept the global flag makes regular expression objects stateful, and would completely screw up re-seq for example.)

Comment by Francis Avila [ 24/Jun/14 7:37 AM ]

I would like to propose a somewhat radical suggestion that would: fix this issue and CLJS-810, put us in a better position to resolve CLJS-485 CLJS-746 CLJS-794 (clojure.string/replace woes), allow us to add some regex-as-a-value niceties to patterns in js (CLJS-67 and CLJS-68), and bring clojurescript's regular expression handling closer to clojure's by implementing more of the re-* functions.

Example implementation (not a patch) at this cljsfiddle: http://cljsfiddle.net/fiddle/favila.regexp

Essential points:

  1. Create a Pattern object, created by re-pattern, which provides methods to create regexps for search (re-find) or exact match (re-matches) or repeated searches (re-seq, re-matcher + re-find). Each of these must be a different RegExp object in javascript even though they are similar regular expression strings. The re-find and re-matches patterns can be cached. All can generate RegExps lazily.
  2. regular expression literals emit these Pattern objects instead of RegExp objects.
  3. Create a Matcher object to correspond to the currently-unimplemented re-matcher. It combines a global-flagged RegExp object, a search string, and a done flag. If it keeps the last match (similar to java), cljs can also implement re-groups.
  4. Make re-seq use the Matcher object and thus the .lastIndex that native RegExps provide for global matches. (Its implementation no longer requires string slicing after every match.)
  5. If re-find is given a native RegExp object instead of a pattern, it will use it as-is. This matches current behavior.
  6. If re-matches is given a native RegExp object and it isn't suitable for exact-matching, a new RegExp is cloned from the input RegExp with ^ and $ prepended and appended and the global flag added. (This technique is used in clojure.string/replace, but incorrectly.)


Generated at Tue Sep 23 05:32:46 CDT 2014 using JIRA 4.4#649-r158309.