Details
-
Type:
Defect
-
Status:
Closed
-
Priority:
Minor
-
Resolution: Declined
-
Affects Version/s: Release 1.2, Release 1.3, Release 1.4
-
Fix Version/s: Release 1.5
-
Component/s: None
-
Labels:None
-
Environment:all
-
Patch:Code and Test
-
Approval:Not Approved
Description
When the first unicode code point of a string is supplementary (i.e. requires two 16-bit Java chars to represent in UTF-16), and that first code point is changed by converting it to upper case, clojure.string/capitalize gives the wrong answer.
If using UTF-16 to encode Unicode strings, and making every UTF-16 code unit (i.e. Java char) individually indexable as a separate entity in strings, is such a bad design choice that you consider it a bug, then yes, this is a Java bug (and a bug in all the other systems that use UTF-16 in this way).
clojure.string/capitalize isn't using some Java capitalization method that has a bug, though. By calling (.toUpperCase (subs s 0 1)) it is not giving enough information to .toUpperCase for any implementation, Java or otherwise, to do the job correctly. It is analogous to calling toupper on the least significant 4 bits of the ASCII encoding of a letter and expecting it to return the correct answer.