Clojure

clojure.string/split strips trailing delimiters

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Prescreened

Description

clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:

(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]

In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings. A workaround for split is to pass a limit of -1.

Proposed: As current users may be relying on the current behavior, the attached merely updates the docstring to warn of this behavior and suggest use of -1 as a limit to workaround.

Patch: clj-1360.patch

Activity

Stuart Halloway made changes -
Field Original Value New Value
Approval Triaged [ 10120 ]
Stuart Halloway made changes -
Priority Major [ 3 ] Minor [ 4 ]
Alex Miller made changes -
Patch Code [ 10001 ]
Approval Triaged [ 10120 ] Prescreened [ 10220 ]
Description clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:

{code}
(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]
{code}

In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings.

This behavior should either be fixed or documented.
clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:

{code}
(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]
{code}

In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings. A workaround for split is to pass a limit of -1.

*Proposed:* As current users may be relying on the current behavior, the attached merely updates the docstring to warn of this behavior and suggest use of -1 as a limit to workaround.

*Patch:* clj-1360.patch
Attachment clj-1360.patch [ 15556 ]
Labels docstring string

People

Vote (2)
Watch (3)

Dates

  • Created:
    Updated: