<< Back to previous view

[DCSV-6] read-csv can not handle white-space at end of line Created: 24/May/13  Updated: 29/May/14

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Cees van Kemenade Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None


 Description   

When whitespace is present after the closing \" the clojure reader crashes with a weird error.
It took me some time to notice it was a white-space issue as whitespace is .... not visible.

See an example of the error below.

=> (read-csv (java.io.StringReader. "\"a\" " ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)
=> (read-csv (java.io.StringReader. "\"a\"" ))
(["a"])



 Comments   
Comment by Cees van Kemenade [ 24/May/13 4:35 PM ]

To take the issue a little further, the same holds for whitespace in the middle of a line between the closing-quote and the separator, see:
=> (read-csv (java.io.StringReader. "\"a\" , 5\n \"b,b\",\"6\"" ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)

This raises the question what happens if you put a space between the separator and the opening quote (first the default case):
=> (read-csv (java.io.StringReader. "\"a\", 5\n\"b\",\"6\"" ))
(["a" " 5"] ["b" "6"])

Now adding one additional space:
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b\",\"6\"" ))
(["a" " 5"] [" \"b\"" "6"])

Interesting, the white-space is considered to be the start of the string and the quote that follows is considered to be part of the tekst-value that is read.
The main reason for using quotes is to allow separators in text, so let us see that happens if we extend the string by putting a separator in it.
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b,b\",\"6\"" ))
(["a" " 5"] [" \"b" "b\"" "6"])

Now we see that the separator is not quoted anymore and as expect, the line is interpreted to contain three values instead of two values.

When using standard libraries the issues mentioned above usually do not appear. However, in custom code that emits csv-files or when doing small manual fixes in a csv it is easy to introduce such an issue/error and subsequently it is quit tough to analyse this issue correctly.
Therefore I would opt for a mode of operation where white-space before an opening-quote or after a closing quote are considered to be void (unless it is an escaped quote like "").

Comment by Paul Schulz [ 29/May/14 9:29 AM ]

This is related to DSCV-8

A quote at the beginning of the string, and ending in the middle of the string (eg. where additional characters appear after second quote) will cause the same problem.

Generated at Wed Nov 26 01:44:19 CST 2014 using JIRA 4.4#649-r158309.