clojure.java.io/resource corrupts path containing UTF-8 characters without issuing warning. (The behavior in the example below is not specific to JDK 8 or Clojure 1.5.0. It is seen with the latest Clojure master as of Sep 15, 2013, and with JDK 6 and JDK 7.)
The implementation of method as-file of protocol Coercions for class java.net.URL transforms each occurrence of '%xy', where x and y are hex digits in ASCII, to a separate character in the result. The correct behavior is to treat sequences of more than one '%xy' as a byte sequence encoded in UTF-8, where single Unicode code points (i.e. 'Unicode characters') are encoded with anywhere from 1 to 4 bytes.
Change method as-file for class java.net.URL to use method java.net.URLDecoder.decode to decode the contents of a URL string.
The only issue with java.net.URLDecoder.decode's behavior is that it changes plus-sign characters to spaces, which according to at least one of the existing unit tests should not happen in as-file. To work around this, first explicitly encode any plus-sign characters in the given URL string, using method java.net.URLEncoder.encode. After that, pass the result to method decode.
Patch clj-1177-patch-v1.txt represents an alternate approach that does its own 'unescaping' of UTF-8 encoded URL strings, without relying on class java.net.URLDecoder. As a result, it is longer and more detailed.
Screened by: Alex Miller