<< Back to previous view

[CLJ-1360] clojure.string/split strips trailing delimiters Created: 18/Feb/14  Updated: 24/Mar/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Tim McCormack Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: docstring, string

Attachments: Text File clj-1360.patch    
Patch: Code
Approval: Prescreened


clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:

(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]

In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings. A workaround for split is to pass a limit of -1.

Proposed: As current users may be relying on the current behavior, the attached merely updates the docstring to warn of this behavior and suggest use of -1 as a limit to workaround.

Patch: clj-1360.patch

Comment by Andy Fingerhut [ 18/Feb/14 10:51 AM ]

Probably documenting would be safer than changing the behavior at this point, given that some people may actually rely on the current behavior after testing, deploying, etc.

I don't currently have a suggestion for a modified doc string, but note that there are examples of this behavior and how one can use an extra "-1" limit argument at the end to get all split strings: http://clojuredocs.org/clojure_core/clojure.string/split

Comment by Crispin Wellington [ 21/May/15 10:46 PM ]

This bug just bit me. +1 to be fixed. If we just document and leave the behavior as is, then we have a surprising and inconsistent behaving split (why are inner empty values kept, but outer ones stripped?) that is different to every other split you've ever used. The optional -1 limit argument looks hacky but a fix could keep this -1 argument working.

EDIT: this looks to be java's string class behavior: http://stackoverflow.com/questions/2170557/split-method-of-string-class-does-not-include-trailing-empty-strings
Would be nice if limit defaulted to -1 on that type of clojure.string/split call.

Comment by Stuart Halloway [ 19/Jul/15 8:03 AM ]

This is really gross, and the original developer has been punched in the neck. (Ow.)

I hate the Java leakage, but given that this is already out there, and that people are likely already relying on both the default and the negative-arg behavior, I think the least bad bet is to document precisely the semantics we have.

Generated at Fri Apr 29 22:23:05 CDT 2016 using JIRA 4.4#649-r158309.