<< Back to previous view

[CLJ-945] clojure.string/capitalize can give wrong result if first char is supplementary Created: 05/Mar/12  Updated: 01/Mar/13  Resolved: 01/Mar/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.2, Release 1.3, Release 1.4
Fix Version/s: Release 1.5

Type: Defect Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: None
Environment:

all


Attachments: Text File capitalize-for-supplementary-chars-patch.txt    
Patch: Code and Test

 Description   

When the first unicode code point of a string is supplementary (i.e. requires two 16-bit Java chars to represent in UTF-16), and that first code point is changed by converting it to upper case, clojure.string/capitalize gives the wrong answer.



 Comments   
Comment by Rich Hickey [ 20/Jul/12 7:43 AM ]

Isn't this a Java bug?

Comment by Andy Fingerhut [ 20/Jul/12 12:36 PM ]

If using UTF-16 to encode Unicode strings, and making every UTF-16 code unit (i.e. Java char) individually indexable as a separate entity in strings, is such a bad design choice that you consider it a bug, then yes, this is a Java bug (and a bug in all the other systems that use UTF-16 in this way).

clojure.string/capitalize isn't using some Java capitalization method that has a bug, though. By calling (.toUpperCase (subs s 0 1)) it is not giving enough information to .toUpperCase for any implementation, Java or otherwise, to do the job correctly. It is analogous to calling toupper on the least significant 4 bits of the ASCII encoding of a letter and expecting it to return the correct answer.

Generated at Tue Sep 02 21:28:51 CDT 2014 using JIRA 4.4#649-r158309.