Clojure

clojure.string/split strips trailing delimiters

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:

(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]

In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings.

This behavior should either be fixed or documented.

Activity

Hide
Andy Fingerhut added a comment -

Probably documenting would be safer than changing the behavior at this point, given that some people may actually rely on the current behavior after testing, deploying, etc.

I don't currently have a suggestion for a modified doc string, but note that there are examples of this behavior and how one can use an extra "-1" limit argument at the end to get all split strings: http://clojuredocs.org/clojure_core/clojure.string/split

Show
Andy Fingerhut added a comment - Probably documenting would be safer than changing the behavior at this point, given that some people may actually rely on the current behavior after testing, deploying, etc. I don't currently have a suggestion for a modified doc string, but note that there are examples of this behavior and how one can use an extra "-1" limit argument at the end to get all split strings: http://clojuredocs.org/clojure_core/clojure.string/split
Hide
Crispin Wellington added a comment - - edited

This bug just bit me. +1 to be fixed. If we just document and leave the behavior as is, then we have a surprising and inconsistent behaving split (why are inner empty values kept, but outer ones stripped?) that is different to every other split you've ever used. The optional -1 limit argument looks hacky but a fix could keep this -1 argument working.

EDIT: this looks to be java's string class behavior: http://stackoverflow.com/questions/2170557/split-method-of-string-class-does-not-include-trailing-empty-strings
Would be nice if limit defaulted to -1 on that type of clojure.string/split call.

Show
Crispin Wellington added a comment - - edited This bug just bit me. +1 to be fixed. If we just document and leave the behavior as is, then we have a surprising and inconsistent behaving split (why are inner empty values kept, but outer ones stripped?) that is different to every other split you've ever used. The optional -1 limit argument looks hacky but a fix could keep this -1 argument working. EDIT: this looks to be java's string class behavior: http://stackoverflow.com/questions/2170557/split-method-of-string-class-does-not-include-trailing-empty-strings Would be nice if limit defaulted to -1 on that type of clojure.string/split call.

People

Vote (2)
Watch (2)

Dates

  • Created:
    Updated: