I would like to propose a somewhat radical suggestion that would: fix this issue and
CLJS-810, put us in a better position to resolve CLJS-485 CLJS-746 CLJS-794 (clojure.string/replace woes), allow us to add some regex-as-a-value niceties to patterns in js ( CLJS-67 and CLJS-68), and bring clojurescript's regular expression handling closer to clojure's by implementing more of the re-* functions.
Example implementation (not a patch) at this cljsfiddle: http://cljsfiddle.net/fiddle/favila.regexp
- regular expression literals emit these Pattern objects instead of RegExp objects.
- Create a Matcher object to correspond to the currently-unimplemented re-matcher. It combines a global-flagged RegExp object, a search string, and a done flag. If it keeps the last match (similar to java), cljs can also implement re-groups.
- Make re-seq use the Matcher object and thus the .lastIndex that native RegExps provide for global matches. (Its implementation no longer requires string slicing after every match.)
- If re-find is given a native RegExp object instead of a pattern, it will use it as-is. This matches current behavior.
- If re-matches is given a native RegExp object and it isn't suitable for exact-matching, a new RegExp is cloned from the input RegExp with ^ and $ prepended and appended and the global flag added. (This technique is used in clojure.string/replace, but incorrectly.)
I apologize if some of my thoughts are vague; I haven't thought about this in a while.
First note that a narrow class of RegExps are effectively "pure": If they do a full-string match (e.g. start with ^ and end with $) and have the global flag set to false then their lastIndex will always seem to be 0.
- Patterns and RegExps can be created from one another, so coercion is always an option. E.g. re-pattern can accept a RegExp and some other (cljs-specific) function can coerce from Pattern or Matcher to RegExp. (Or maybe re-matcher can return a RegExp-compatible object--see below.)
- RegExp given to cljs re-*: "Pure" regexes can be used directly, otherwise we create a Pattern and/or Matcher. (I don't remember the details, but the fiddle should cover them.)
- Pattern used as a RegExp: Patterns can expose all the properties of RegExp instances. If the pattern is pure, it can implement .test and .exec. .lastIndex will always be 0. Not sure what to do about impure patterns: throw exception, act pure anyway, return a new object?
- Matcher used as a RegExp: A Matcher can exactly replicate a RegExp instance, maybe even use the same prototype. Using it like a RegExp will mutate the object and disturb its internal state, but as long as it's either used consistently as a RegExp or consistently as a Matcher this won't matter. Notes:
- Matcher must be a wrapper and not a normal RegExp because of these three extra bits of state.
- The return value of re-matcher is only consumed by the 1-arg form of re-find and re-groups.
- re-seq can use a matcher internally, but since it is private it doesn't have to.
- Should other Java methods of Matcher be implemented?
- Pattern given to String.prototype.match, .replace, .search, and .split: I haven't thought about this. Considerations:
- Problem code is any cljs code using an object created via pattern literals or re-pattern and using it as an argument to these String methods. If they use clojure.string methods instead they would be fine.
- Such code is also impossible in java clojure: only (.split s "pattern-str") is the same in java/clj and js/cljs and it will continue to work (without flags) on both platforms. Possibly we could just make people fix such code since it is platform-specific, but I need to see how widespread this is.
- The fix for such code is to either:
- Use a pattern->regexp coercion function we will provide.
- Construct those regexps directly with js/RegExp.
- Use clojure.string functions instead of String methods. This also has the advantage of being portable between clj and cljs.
- Possibly we can patch the RegExp constructor or mess with the String prototype chain to do pattern->regexp coercion automatically, but this is a violent solution.
Correct me if I'm wrong, but in Clojure (java) code it is extremely uncommon to use Pattern and Match methods or to use them with String methods directly. For the most part they are treated as opaque objects used via re-* and clojure.string/*. Code written in the same style in cljs would be unaffected, and we can declare any other use as platform-specific and require explicit creation of RegExps (and don't bother to make Matcher or Pattern act like RegExps). This is my preferred approach for interop if there isn't too much use of RegExp.prototype.test, .exec, and String.prototype.match, .replace, .search, and .split.