data.csv

\return as record separator with unquoted fields is read as part of the field

Details

  • Type: Defect Defect
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

This regards the gray area of being "more forgiving." If I understand RFC 4180 correctly, I want to suggest substituting one bit of forgiveness for another: rather than supporting unquoted, multi-line cell values, I suggest supporting CSVs with just \return as the record-separator. Would you accept a patch for that?

A file with \return as record-separator is interpreted by read-csv as a single row like (["Header1" "Header2\rval1" "val2"]). I believe the RFC only allows fields to contain CR and LF when they're escaped (i.e., surrounded in double quotes). See the ABNF at the end of section 2.

As far as implementation, I believe this would require wrapping any Reader w/o markSupported in one that does, so that the LF following a CR can be consumed when present.

[I've classified this as a major defect because I ran into a \return-delimited file as soon as I passed a CSV from a Linux machine to a Windows machine, so I'm guessing these files are common. Feel free to reclassify.]

Activity

Hide
Jonas Enlund added a comment -

> rather than supporting unquoted, multi-line cell values, I suggest supporting CSVs with just \return as the record-separator. Would you accept a patch for that?

Sounds good to me.

> As far as implementation, I believe this would require wrapping any Reader w/o markSupported in one that does

I think that's ok, since BufferedReader supports it.

Show
Jonas Enlund added a comment - > rather than supporting unquoted, multi-line cell values, I suggest supporting CSVs with just \return as the record-separator. Would you accept a patch for that? Sounds good to me. > As far as implementation, I believe this would require wrapping any Reader w/o markSupported in one that does I think that's ok, since BufferedReader supports it.
Hide
Paul Stadig added a comment -

A patch by myself (Paul Stadig) and Nate Young. We both have CAs on file.

This patch will wrap any reader in a PushbackReader.

When parsing a CSV file, a single return character (ASCII 13) will be treated as a record separator.

We ran into this issue in production. Apparently on OSX if you export an Excel spreadsheet as CSV it will use return as a record separator. However, if you export it as a "Windows CSV" it will use CRLF. This is a bit too subtle for some users, and it would be preferable to be more flexible parsing record separators.

Show
Paul Stadig added a comment - A patch by myself (Paul Stadig) and Nate Young. We both have CAs on file. This patch will wrap any reader in a PushbackReader. When parsing a CSV file, a single return character (ASCII 13) will be treated as a record separator. We ran into this issue in production. Apparently on OSX if you export an Excel spreadsheet as CSV it will use return as a record separator. However, if you export it as a "Windows CSV" it will use CRLF. This is a bit too subtle for some users, and it would be preferable to be more flexible parsing record separators.
Paul Stadig made changes -
Field Original Value New Value
Attachment return-record-separator.patch [ 14954 ]
Hide
Jonas Enlund added a comment -

I released 0.1.3 with this fix. Thanks for the patch!

Show
Jonas Enlund added a comment - I released 0.1.3 with this fix. Thanks for the patch!
Jonas Enlund made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]

People

Vote (1)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: