ClojureScript

clojure.string/split adds separator matches & failed matches (nil) when the separator is a regex with alternation

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: 0.0-3308
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

I want to split a string on "; ", and optionally discard a final ";". So, I tried:

(clojure.string/split "ab; ab;" #"(; )|(;$)")

In Clojure, this does what I want:

["ab" "ab"]

In ClojureScript, I get:

["ab" "; " nil "ab" nil ";"]

I'm not sure to what extent this is a platform distinction and to what extent it's a bug. Returning nils and seperators from clojure.string/split's output seems like it's against string.split's contract?

Activity

David Nolen made changes -
Field Original Value New Value
Priority Major [ 3 ] Minor [ 4 ]
Hide
Erik Assum added a comment -

Might not be the answer you want, but Clojurescript uses js' split implementation.
Testing this in the browser you get

> "ab; ab;".split(/(; )|(;$)/)
< ["ab", "; ", undefined, "ab", undefined, ";", ""] (7)
>

from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

Which means that to avoid this, you should use non-capturing groups:

(clojure.string/split "ab; ab;" #"(?:; )|(?:;$)")

Which incidentally can be simplified to

(clojure.string/split "ab; ab;" #";(?: |$)")

Which produces the result you're after in both clojure and clojurescript.

Show
Erik Assum added a comment - Might not be the answer you want, but Clojurescript uses js' split implementation. Testing this in the browser you get
> "ab; ab;".split(/(; )|(;$)/)
< ["ab", "; ", undefined, "ab", undefined, ";", ""] (7)
>
from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split
If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.
Which means that to avoid this, you should use non-capturing groups:
(clojure.string/split "ab; ab;" #"(?:; )|(?:;$)")
Which incidentally can be simplified to
(clojure.string/split "ab; ab;" #";(?: |$)")
Which produces the result you're after in both clojure and clojurescript.

People

  • Assignee:
    Unassigned
    Reporter:
    lvh
Vote (0)
Watch (0)

Dates

  • Created:
    Updated: