[CLJ-1312] clojure.string/split on empty string includes empty string in results Created: 21/Dec/13 Updated: 07/Sep/14 Resolved: 21/Dec/13
|Affects Version/s:||Release 1.5|
Splitting a string using clojure.string/split with an empty regex includes the empty string in the results - is this expected behaviour?
|Comment by Alex Miller [ 21/Dec/13 8:05 AM ]|
Yes, I think so. This is a case where Clojure defers to the host (Java) for behavior. I think the way to interpret this is that the empty pattern matches all strings. Split checks left to right whether there is a next chunk of string that matches the pattern. The empty pattern matches at the beginning to a string of length 0. Something like that.
|Comment by Mark Engelberg [ 07/Sep/14 12:27 PM ]|
This bug is a real problem, because it works differently on Windows than on Linux. On Windows, clojure.string/split behaves exactly as you'd expect:
user=> (clojure.string/split "abc" #"")
Only on Linux do you get the strange behavior where the empty string shows up at the beginning of the list.
I recently had a student that got burned by this in some webserver code that relied on splitting using the empty regex. It performed flawlessly on her local Windows machine, but mysteriously broke when she uploaded the uberwar to the cloud. The bug was very difficult to track down.
If this were a bug on both Windows and Linux, at least you could plan around it. But right now, it's an obstacle to Clojure's capability of running consistently across platforms.
|Comment by Mark Engelberg [ 07/Sep/14 12:40 PM ]|
Upon further research, I've found that this is not a Windows/Linux issue, rather it's a difference between Java 7 and Java 8. On Java 8, splitting with the empty string no longer produces a sequence that begins with an empty string.
As you said before, this is just a gotcha relating to Java, not a Clojure issue.