Clojure

clojure.string/capitalize can give wrong result if first char is supplementary

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: Release 1.2, Release 1.3, Release 1.4
  • Fix Version/s: Release 1.5
  • Component/s: None
  • Labels:
    None
  • Environment:
    all
  • Patch:
    Code and Test

Description

When the first unicode code point of a string is supplementary (i.e. requires two 16-bit Java chars to represent in UTF-16), and that first code point is changed by converting it to upper case, clojure.string/capitalize gives the wrong answer.

Activity

Rich Hickey made changes -
Field Original Value New Value
Fix Version/s Release 1.5 [ 10150 ]
Stuart Halloway made changes -
Approval Screened [ 10004 ]
Hide
Rich Hickey added a comment -

Isn't this a Java bug?

Show
Rich Hickey added a comment - Isn't this a Java bug?
Rich Hickey made changes -
Approval Screened [ 10004 ] None [ 10008 ]
Hide
Andy Fingerhut added a comment -

If using UTF-16 to encode Unicode strings, and making every UTF-16 code unit (i.e. Java char) individually indexable as a separate entity in strings, is such a bad design choice that you consider it a bug, then yes, this is a Java bug (and a bug in all the other systems that use UTF-16 in this way).

clojure.string/capitalize isn't using some Java capitalization method that has a bug, though. By calling (.toUpperCase (subs s 0 1)) it is not giving enough information to .toUpperCase for any implementation, Java or otherwise, to do the job correctly. It is analogous to calling toupper on the least significant 4 bits of the ASCII encoding of a letter and expecting it to return the correct answer.

Show
Andy Fingerhut added a comment - If using UTF-16 to encode Unicode strings, and making every UTF-16 code unit (i.e. Java char) individually indexable as a separate entity in strings, is such a bad design choice that you consider it a bug, then yes, this is a Java bug (and a bug in all the other systems that use UTF-16 in this way). clojure.string/capitalize isn't using some Java capitalization method that has a bug, though. By calling (.toUpperCase (subs s 0 1)) it is not giving enough information to .toUpperCase for any implementation, Java or otherwise, to do the job correctly. It is analogous to calling toupper on the least significant 4 bits of the ASCII encoding of a letter and expecting it to return the correct answer.
Stuart Halloway made changes -
Resolution Declined [ 2 ]
Status Open [ 1 ] Closed [ 6 ]

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: