Clojure URL to File coercion and encoding of non-ASCII characters


  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:

Description corrupts path containing UTF-8 characters without issuing warning. (The behavior in the example below is not specific to JDK 8 or Clojure 1.5.0. It is seen with the latest Clojure master as of Sep 15, 2013, and with JDK 6 and JDK 7.)

user=> (System/getProperty "java.runtime.version")
user=> (clojure-version)
user=> (System/getProperty "user.dir")
user=> ( "myfile.txt")
#<URL file:/dir/d%c3%a9f/resources/myfile.txt>
user=> (slurp ( "myfile.txt") :encoding "UTF-8")
FileNotFoundException /dir/déf/resources/myfile.txt (No such file or directory) (


The implementation of method as-file of protocol Coercions for class transforms each occurrence of '%xy', where x and y are hex digits in ASCII, to a separate character in the result. The correct behavior is to treat sequences of more than one '%xy' as a byte sequence encoded in UTF-8, where single Unicode code points (i.e. 'Unicode characters') are encoded with anywhere from 1 to 4 bytes.

Patch: clj-1177-patch-v2.diff


Change method as-file for class to use method to decode the contents of a URL string.,%20java.lang.String%29

The only issue with's behavior is that it changes plus-sign characters to spaces, which according to at least one of the existing unit tests should not happen in as-file. To work around this, first explicitly encode any plus-sign characters in the given URL string, using method After that, pass the result to method decode.,%20java.lang.String%29

Other approaches:

Patch clj-1177-patch-v1.txt represents an alternate approach that does its own 'unescaping' of UTF-8 encoded URL strings, without relying on class As a result, it is longer and more detailed.

Screened by: Alex Miller

  1. clj-1177-patch-v1.txt
    08/Mar/13 1:30 PM
    3 kB
    Andy Fingerhut
  2. clj-1177-patch-v2.diff
    22/Oct/13 9:10 AM
    2 kB
    Alex Miller
  3. clj-1177-patch-v2.txt
    01/Sep/13 10:51 AM
    2 kB
    Andy Fingerhut



Vote (1)
Watch (2)


  • Created: