Clojure

clojure.string/trim uses different defn of whitespace as triml, trimr

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: Release 1.6
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

user=> (use 'clojure.string)
nil
user=> (def s "  \u2002  foo")
#'user/s
user=> (trim s)
"?  foo"
user=> (triml s)
"foo"

Cause: triml and trimr use Character/isWhitespace. trim uses String/trim which seems to define whitespace as any character less than or equal '\u0020'. The isWhitespace() definition is slightly different and includes other Unicode space characters.

Approach: The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trimr to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

Patch: clj935-3.patch

Screened by: Stuart Sierra

  1. clj935-2.patch
    30/Aug/13 3:47 PM
    3 kB
    Alex Miller
  2. clj935-3.patch
    02/Dec/13 11:07 PM
    3 kB
    Alex Miller
  3. fix-trim-fns-different-whitespace-patch.txt
    21/Feb/12 1:29 PM
    3 kB
    Andy Fingerhut

Activity

Rich Hickey made changes -
Field Original Value New Value
Fix Version/s Release 1.5 [ 10150 ]
Stuart Halloway made changes -
Fix Version/s Release 1.5 [ 10150 ]
Fix Version/s Release 1.6 [ 10157 ]
Aaron Bedra made changes -
Assignee Aaron Bedra [ aaron ]
Aaron Bedra made changes -
Affects Version/s Release 1.3 [ 10038 ]
Affects Version/s Release 1.2 [ 10037 ]
Affects Version/s Release 1.6 [ 10157 ]
Alex Miller made changes -
Approval Vetted [ 10003 ]
Alex Miller made changes -
Description clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"

The attached patch changes trim to use Character/isWhitespace. I suppose other possibilities are to change triml and trimr to use trim's notion of whitespace, whatever that is, or to just leave these functions inconsistent with each other. It does seem that it would be a nice property that (trim s) is equal to (triml (trimr s)) for all strings.

The patch also changes triml to only call .length on s once.
clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. Other possibilities are to change triml and trimr to use trim's notion of whitespace, or to just leave these functions inconsistent with each other. It does seem that it would be a nice property that (trim s) is equal to (triml (trimr s)) for all strings.

The patch also changes triml to only call .length on s once.

*Patch:* fix-trim-fns-different-whitespace-patch.txt
Alex Miller made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Description clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. Other possibilities are to change triml and trimr to use trim's notion of whitespace, or to just leave these functions inconsistent with each other. It does seem that it would be a nice property that (trim s) is equal to (triml (trimr s)) for all strings.

The patch also changes triml to only call .length on s once.

*Patch:* fix-trim-fns-different-whitespace-patch.txt
clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trim to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-2.patch

*Screened by:* Alex Miller
Attachment clj935-2.patch [ 12215 ]
Assignee Aaron Bedra [ aaron ]
Alex Miller made changes -
Labels string
Rich Hickey made changes -
Approval Screened [ 10004 ] Incomplete [ 10006 ]
Alex Miller made changes -
Attachment clj935-3.patch [ 12504 ]
Alex Miller made changes -
Approval Incomplete [ 10006 ] Screened [ 10004 ]
Description clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trim to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-2.patch

*Screened by:* Alex Miller
clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trim to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-3.patch

*Screened by:* Alex Miller
Alex Miller made changes -
Approval Screened [ 10004 ] Vetted [ 10003 ]
Alex Miller made changes -
Description clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trim to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-3.patch

*Screened by:* Alex Miller
clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trim to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-3.patch

*Screened by:*
Stuart Sierra made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Description clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trim to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-3.patch

*Screened by:*
clojure.string/triml and trimr use Character/isWhitespace to determine whether a character is whitespace, but trim uses some other definition of white space character. For example:

{code}
user=> (use 'clojure.string)
nil
user=> (def s " \u2002 foo")
#'user/s
user=> (trim s)
"? foo"
user=> (triml s)
"foo"
{code}

*Cause:* {{triml}} and {{trimr}} use [Character/isWhitespace|http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)]. {{trim}} uses [String/trim|http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#trim()] which seems to define whitespace as any character less than or equal '\u0020'. The {{isWhitespace()}} definition is slightly different and includes other Unicode space characters.

*Approach:* The attached patch changes trim to use Character/isWhitespace. The isWhitespace version seems generally newer and more Unicode considerate so this was chosen over changing triml and trimr to match trim.

A few alternative implementations were considered with respect to longs, ints, etc. The patch opts to use the simplest possible code, eschewing any extreme performance measures. See the comments for more info if desired.

The patch also changes triml to only call .length on s once.

*Patch:* clj935-3.patch

*Screened by:* Stuart Sierra
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Stuart Halloway made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Closed [ 6 ]

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: