[CLJS-1320] clojure.string/split adds separator matches & failed matches (nil) when the separator is a regex with alternation Created: 26/Jun/15  Updated: 10/Apr/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 0.0-3308
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: lvh Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

I want to split a string on "; ", and optionally discard a final ";". So, I tried:

(clojure.string/split "ab; ab;" #"(; )|(;$)")

In Clojure, this does what I want:

["ab" "ab"]

In ClojureScript, I get:

["ab" "; " nil "ab" nil ";"]

I'm not sure to what extent this is a platform distinction and to what extent it's a bug. Returning nils and seperators from clojure.string/split's output seems like it's against string.split's contract?



 Comments   
Comment by Erik Assum [ 10/Apr/17 11:12 AM ]

Might not be the answer you want, but Clojurescript uses js' split implementation.
Testing this in the browser you get

> "ab; ab;".split(/(; )|(;$)/)
< ["ab", "; ", undefined, "ab", undefined, ";", ""] (7)
>

from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

Which means that to avoid this, you should use non-capturing groups:

(clojure.string/split "ab; ab;" #"(?:; )|(?:;$)")

Which incidentally can be simplified to

(clojure.string/split "ab; ab;" #";(?: |$)")

Which produces the result you're after in both clojure and clojurescript.

Generated at Thu Apr 25 18:50:59 CDT 2019 using JIRA 4.4#649-r158309.