Clojure

LispReader uses Character.isWhitespace rather than Character.isSpaceChar

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Resolution: Declined
  • Affects Version/s: None
  • Fix Version/s: Backlog
  • Component/s: None
  • Labels:
    None

Description

Character.isWhitespace doesn't handle non-breaking space correctly. Apparently it's pretty ancient from The Olden Days Before People Knew How To Do Character Encodings.

In Java 1.5 Character.isSpaceChar was added, which handles supplementary characters the right way: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html#isWhitespace(char)

Activity

Hide
Assembla Importer added a comment -
Show
Assembla Importer added a comment - Converted from http://www.assembla.com/spaces/clojure/tickets/419
Hide
Assembla Importer added a comment -

stu said: Phil,

Please add an example and nag me to bump the priority if this is causing real and present pain.

Show
Assembla Importer added a comment - stu said: Phil, Please add an example and nag me to bump the priority if this is causing real and present pain.
Hide
Assembla Importer added a comment -

technomancy said: Eh; it's not causing real pain.

I would be OK with a WONTFIX if that's what is decided. Just thought it wouldn't hurt to have a record of it somewhere (even as a closed-as-invalid ticket), and I was in a particularly pedantic mood last night for some reason.

I ran across it because of an escaping bug in Wine where I wanted to treat "(use 'foo)(-main)" as a single token in bash but two still be valid Clojure code. But luckily I found a better workaround. I am OK with leaving it at lowest priority.

Show
Assembla Importer added a comment - technomancy said: Eh; it's not causing real pain. I would be OK with a WONTFIX if that's what is decided. Just thought it wouldn't hurt to have a record of it somewhere (even as a closed-as-invalid ticket), and I was in a particularly pedantic mood last night for some reason. I ran across it because of an escaping bug in Wine where I wanted to treat "(use 'foo)(-main)" as a single token in bash but two still be valid Clojure code. But luckily I found a better workaround. I am OK with leaving it at lowest priority.
Hide
Assembla Importer added a comment -

djpowell said: Hmm, I'm not sure isSpaceChar is right - it doesn't seem to allow things like tabs and newlines. If you really wanted to support non-break-space, then it would probably be best to just use isWhitespace and add them as a special case.

Actually... I would quite like to see \ufeff treated as whitespace. It is the Unicode BOM. Some editors including Windows Notepad include the BOM at the start of UTF-8 files. The latest Unicode docs seem to recognise the UTF-8 BOM. By treating it as whitespace we can avoid any problems with it.

Show
Assembla Importer added a comment - djpowell said: Hmm, I'm not sure isSpaceChar is right - it doesn't seem to allow things like tabs and newlines. If you really wanted to support non-break-space, then it would probably be best to just use isWhitespace and add them as a special case. Actually... I would quite like to see \ufeff treated as whitespace. It is the Unicode BOM. Some editors including Windows Notepad include the BOM at the start of UTF-8 files. The latest Unicode docs seem to recognise the UTF-8 BOM. By treating it as whitespace we can avoid any problems with it.
Hide
Assembla Importer added a comment -

technomancy said: Sounds like I had this not quite right; probably not worth worrying about.

Show
Assembla Importer added a comment - technomancy said: Sounds like I had this not quite right; probably not worth worrying about.

People

  • Assignee:
    Unassigned
    Reporter:
    Anonymous
Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: