data.csv

read-csv can not handle white-space at end of line

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

When whitespace is present after the closing \" the clojure reader crashes with a weird error.
It took me some time to notice it was a white-space issue as whitespace is .... not visible.

See an example of the error below.

=> (read-csv (java.io.StringReader. "\"a\" " ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)
=> (read-csv (java.io.StringReader. "\"a\"" ))
(["a"])

Activity

Hide
Cees van Kemenade added a comment -

To take the issue a little further, the same holds for whitespace in the middle of a line between the closing-quote and the separator, see:
=> (read-csv (java.io.StringReader. "\"a\" , 5\n \"b,b\",\"6\"" ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)

This raises the question what happens if you put a space between the separator and the opening quote (first the default case):
=> (read-csv (java.io.StringReader. "\"a\", 5\n\"b\",\"6\"" ))
(["a" " 5"] ["b" "6"])

Now adding one additional space:
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b\",\"6\"" ))
(["a" " 5"] [" \"b\"" "6"])

Interesting, the white-space is considered to be the start of the string and the quote that follows is considered to be part of the tekst-value that is read.
The main reason for using quotes is to allow separators in text, so let us see that happens if we extend the string by putting a separator in it.
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b,b\",\"6\"" ))
(["a" " 5"] [" \"b" "b\"" "6"])

Now we see that the separator is not quoted anymore and as expect, the line is interpreted to contain three values instead of two values.

When using standard libraries the issues mentioned above usually do not appear. However, in custom code that emits csv-files or when doing small manual fixes in a csv it is easy to introduce such an issue/error and subsequently it is quit tough to analyse this issue correctly.
Therefore I would opt for a mode of operation where white-space before an opening-quote or after a closing quote are considered to be void (unless it is an escaped quote like "").

Show
Cees van Kemenade added a comment - To take the issue a little further, the same holds for whitespace in the middle of a line between the closing-quote and the separator, see: => (read-csv (java.io.StringReader. "\"a\" , 5\n \"b,b\",\"6\"" )) Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36) This raises the question what happens if you put a space between the separator and the opening quote (first the default case): => (read-csv (java.io.StringReader. "\"a\", 5\n\"b\",\"6\"" )) (["a" " 5"] ["b" "6"]) Now adding one additional space: => (read-csv (java.io.StringReader. "\"a\", 5\n \"b\",\"6\"" )) (["a" " 5"] [" \"b\"" "6"]) Interesting, the white-space is considered to be the start of the string and the quote that follows is considered to be part of the tekst-value that is read. The main reason for using quotes is to allow separators in text, so let us see that happens if we extend the string by putting a separator in it. => (read-csv (java.io.StringReader. "\"a\", 5\n \"b,b\",\"6\"" )) (["a" " 5"] [" \"b" "b\"" "6"]) Now we see that the separator is not quoted anymore and as expect, the line is interpreted to contain three values instead of two values. When using standard libraries the issues mentioned above usually do not appear. However, in custom code that emits csv-files or when doing small manual fixes in a csv it is easy to introduce such an issue/error and subsequently it is quit tough to analyse this issue correctly. Therefore I would opt for a mode of operation where white-space before an opening-quote or after a closing quote are considered to be void (unless it is an escaped quote like "").
Hide
Paul Schulz added a comment - - edited

This is related to DSCV-8

A quote at the beginning of the string, and ending in the middle of the string (eg. where additional characters appear after second quote) will cause the same problem.

Show
Paul Schulz added a comment - - edited This is related to DSCV-8 A quote at the beginning of the string, and ending in the middle of the string (eg. where additional characters appear after second quote) will cause the same problem.

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated: