Details
-
Type:
Defect
-
Status:
Closed
-
Priority:
Minor
-
Resolution: Completed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
With latest master version of tools.reader (1.3.0), some files with carriage returns that are not followed by a formfeed or newline character can cause the default pushback buffer size of 1 to be overflowed, resulting in an exception thrown during reading. The root cause is that peek-char is implemented with a read-char followed by unread for some types of reader objects, and read-char ends with a call to peek-char when such an isolated carriage return is found.
One example to reproduce:
user=> (require '[clojure.java.io :as io]) nil user=> (require '[clojure.tools.reader.edn :as tre]) nil user=> (require '[clojure.tools.reader.reader-types :as rt]) nil user=> (spit "test.edn" "[a\rb]") nil user=> (def rdr (rt/indexing-push-back-reader (io/reader "test.edn"))) #'user/rdr user=> (tre/read rdr) IOException Pushback buffer overflow java.io.PushbackReader.unread (PushbackReader.java:155) user=> (def rdr (clojure.lang.LineNumberingPushbackReader. (io/reader "test.edn"))) #'user/rdr user=> (clojure.edn/read rdr) [a b]
If anyone is experiencing this problem and wants a workaround, if you can replace all carriage returns with newlines in the input file/string/reader, you should avoid this problem, although if you are using a line numbering/indexing reader, that may change the line numbers of lines in the data.
Thinking a little bit about possible changes to the code that might be made to avoid this issue:
(a) quick and dirty might be to just add 1 to whatever pushback buffer size is given when creating a pushback reader, with a minimum of 2 instead of 1.
(b) Maybe there is a way to change the definition of read-token so that its first operation on the reader is not unread, but I haven't carefully checked all of the places read-token is called from to see if that could be made to work.
(c) Implement the desired behavior of normalize-newline in a different way, by putting the state that the last character read was a carriage-return into the places where normalize-newline is called. That might get a little messy looking.