Details
-
Type:
Defect
-
Status:
Closed
-
Resolution: Declined
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
In clojure.contrib.duck-streams append-spit writes out encoding
markers (for UnicodeLittle for example this is a FEFF in hex)
each time it appends to a file. This should happen only when
the file is initially created.
Test case for reproducing this behaviour:
(use 'clojure.contrib.duck-streams) (binding [*default-encoding* "UnicodeLittle"] (append-spit "/foo.txt" "Line 1\n")) (binding [*default-encoding* "UnicodeLittle"] (append-spit "/foo.txt" "Line 2\n")) (slurp "c:/foo.txt" "UnicodeLittle")
The slurp outputs
"Line 1\n?Line 2\n"
The expected output is:
"Line 1\nLine 2\n"
stu said: I am not sure there is a good answer here. The code above chooses an encoding with an explicit marker, and gets what it asks for.
One proposed solution (http://github.com/sergey-miryanov/clojure-contrib/commits/bug-30) tries to detect this scenario, and recover via a hard-coded mapping between encodings-with-markers and similar-encodings-without. But I don't think this can work in general, because the set of possible encodings is open and the Charset API doesn't provide a mapping between the with-markers and without-markers versions.
Sorry, and please feel free to reopen this if I am missing an obvious approach.