Clojure

clojure.string/split on empty string includes empty string in results

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Labels:

Description

Splitting a string using clojure.string/split with an empty regex includes the empty string in the results - is this expected behaviour?

Example:

Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java
user=> (clojure.string/split "abc" #"")
["" "a" "b" "c"]

Activity

Alex Miller made changes -
Field Original Value New Value
Labels bug string
Hide
Alex Miller added a comment -

Yes, I think so. This is a case where Clojure defers to the host (Java) for behavior. I think the way to interpret this is that the empty pattern matches all strings. Split checks left to right whether there is a next chunk of string that matches the pattern. The empty pattern matches at the beginning to a string of length 0. Something like that.

Show
Alex Miller added a comment - Yes, I think so. This is a case where Clojure defers to the host (Java) for behavior. I think the way to interpret this is that the empty pattern matches all strings. Split checks left to right whether there is a next chunk of string that matches the pattern. The empty pattern matches at the beginning to a string of length 0. Something like that.
Alex Miller made changes -
Resolution Declined [ 2 ]
Status Open [ 1 ] Closed [ 6 ]
Hide
Mark Engelberg added a comment -

This bug is a real problem, because it works differently on Windows than on Linux. On Windows, clojure.string/split behaves exactly as you'd expect:

user=> (clojure.string/split "abc" #"")
["a" "b" "c"]

Only on Linux do you get the strange behavior where the empty string shows up at the beginning of the list.

I recently had a student that got burned by this in some webserver code that relied on splitting using the empty regex. It performed flawlessly on her local Windows machine, but mysteriously broke when she uploaded the uberwar to the cloud. The bug was very difficult to track down.

If this were a bug on both Windows and Linux, at least you could plan around it. But right now, it's an obstacle to Clojure's capability of running consistently across platforms.

Show
Mark Engelberg added a comment - This bug is a real problem, because it works differently on Windows than on Linux. On Windows, clojure.string/split behaves exactly as you'd expect: user=> (clojure.string/split "abc" #"") ["a" "b" "c"] Only on Linux do you get the strange behavior where the empty string shows up at the beginning of the list. I recently had a student that got burned by this in some webserver code that relied on splitting using the empty regex. It performed flawlessly on her local Windows machine, but mysteriously broke when she uploaded the uberwar to the cloud. The bug was very difficult to track down. If this were a bug on both Windows and Linux, at least you could plan around it. But right now, it's an obstacle to Clojure's capability of running consistently across platforms.
Hide
Mark Engelberg added a comment -

Upon further research, I've found that this is not a Windows/Linux issue, rather it's a difference between Java 7 and Java 8. On Java 8, splitting with the empty string no longer produces a sequence that begins with an empty string.

As you said before, this is just a gotcha relating to Java, not a Clojure issue.

Show
Mark Engelberg added a comment - Upon further research, I've found that this is not a Windows/Linux issue, rather it's a difference between Java 7 and Java 8. On Java 8, splitting with the empty string no longer produces a sequence that begins with an empty string. As you said before, this is just a gotcha relating to Java, not a Clojure issue.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: