Details
-
Type:
Enhancement
-
Status:
Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: Release 1.5
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Patch:Code
Description
clojure.java.io/resource corrupts path containing UTF-8 characters without issuing warning.
user=> (System/getProperty "java.runtime.version") "1.8.0-ea-b79" user=> (clojure-version) "1.5.0" user=> (System/getProperty "user.dir") "/dir/déf" user=> (clojure.java.io/resource "myfile.txt") #<URL file:/dir/d%c3%a9f/resources/myfile.txt> user=> (slurp (clojure.java.io/resource "myfile.txt") :encoding "UTF-8") FileNotFoundException /dir/déf/resources/myfile.txt (No such file or directory) java.io.FileInputStream.open (FileInputStream.java:-2)
Patch clj-1177-patch-v1.txt dated Mar 8 2013 is an attempt to solve this issue, in what I think may be a correct way. As specified in RFC 3986, when taking a Unicode string and making a URL of it, it should be encoded in UTF-8 and then each individual byte is subject to the %HH hex encoding. This patch reverses that to turn URLs into file names.
Tested on Mac OS X 10.6 with a command line like this (it doesn't work without the -Dfile.encoding=UTF-8 option on my Mac, probably because the default encoding is MacRoman):
% java -cp clojure.jar:path/to/resource -Dfile.encoding=UTF-8 clojure.main
user=> (require '[clojure.java.io :as io])
nil
user=> (io/resource "abcíd/foo.txt")
#<URL file:/Users/jafinger/clj/clj-ns-browser/resource/abc%c3%add/foo.txt>
user=> (slurp (io/resource "abcíd/foo.txt"))
"The quick brown fox jumped over the lázy dög!\n"