[CLJ-1360] clojure.string/split strips trailing delimiters Created: 18/Feb/14 Updated: 24/Mar/16
clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:
In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings. A workaround for split is to pass a limit of -1.
Proposed: As current users may be relying on the current behavior, the attached merely updates the docstring to warn of this behavior and suggest use of -1 as a limit to workaround.
|Comment by Andy Fingerhut [ 18/Feb/14 10:51 AM ]|
Probably documenting would be safer than changing the behavior at this point, given that some people may actually rely on the current behavior after testing, deploying, etc.
I don't currently have a suggestion for a modified doc string, but note that there are examples of this behavior and how one can use an extra "-1" limit argument at the end to get all split strings: http://clojuredocs.org/clojure_core/clojure.string/split
|Comment by Crispin Wellington [ 21/May/15 10:46 PM ]|
This bug just bit me. +1 to be fixed. If we just document and leave the behavior as is, then we have a surprising and inconsistent behaving split (why are inner empty values kept, but outer ones stripped?) that is different to every other split you've ever used. The optional -1 limit argument looks hacky but a fix could keep this -1 argument working.
EDIT: this looks to be java's string class behavior: http://stackoverflow.com/questions/2170557/split-method-of-string-class-does-not-include-trailing-empty-strings
|Comment by Stuart Halloway [ 19/Jul/15 8:03 AM ]|
This is really gross, and the original developer has been punched in the neck. (Ow.)
I hate the Java leakage, but given that this is already out there, and that people are likely already relying on both the default and the negative-arg behavior, I think the least bad bet is to document precisely the semantics we have.