<< Back to previous view

[DCSV-2] \return characters do not trigger value quoting Created: 10/Feb/12  Updated: 14/Feb/12  Resolved: 13/Feb/12

Status: Resolved
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Giorgio Valoti Assignee: Jonas Enlund
Resolution: Completed Votes: 0
Labels: None
Environment:

Apache Maven 3.0.3 (r1075438; 2011-02-28 18:31:09+0100)
Maven home: /usr/share/maven
Java version: 1.6.0_29, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: it_IT, platform encoding: MacRoman
OS name: "mac os x", version: "10.7.2", arch: "x86_64", family: "mac"


Attachments: File csv.clj     File csv_test.clj    
Patch: Code and Test

 Description   

If the csv file contains \return characters the values are not quoted. A possible patch is attached.



 Comments   
Comment by Jonas Enlund [ 13/Feb/12 11:16 PM ]

This is fixed in version 0.1.1. I couldn't accept your patch though, as I didn't find you on the contributor list at http://clojure.org/contributing

Comment by Giorgio Valoti [ 14/Feb/12 12:36 AM ]

oh, sorry about that. I’ve completely forgot it because of the problems with jira. Glad to hear it was useful, anyway.

BTW
Why didn’t I receive notifications from Jira when the tickets were closed? Should I “watch” it





[DCSV-4] \return as record separator with unquoted fields is read as part of the field Created: 24/Oct/12  Updated: 10/Aug/15  Resolved: 10/Aug/15

Status: Resolved
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: John Hume Assignee: Jonas Enlund
Resolution: Completed Votes: 1
Labels: None

Attachments: Text File return-record-separator.patch    

 Description   

This regards the gray area of being "more forgiving." If I understand RFC 4180 correctly, I want to suggest substituting one bit of forgiveness for another: rather than supporting unquoted, multi-line cell values, I suggest supporting CSVs with just \return as the record-separator. Would you accept a patch for that?

A file with \return as record-separator is interpreted by read-csv as a single row like (["Header1" "Header2\rval1" "val2"]). I believe the RFC only allows fields to contain CR and LF when they're escaped (i.e., surrounded in double quotes). See the ABNF at the end of section 2.

As far as implementation, I believe this would require wrapping any Reader w/o markSupported in one that does, so that the LF following a CR can be consumed when present.

[I've classified this as a major defect because I ran into a \return-delimited file as soon as I passed a CSV from a Linux machine to a Windows machine, so I'm guessing these files are common. Feel free to reclassify.]



 Comments   
Comment by Jonas Enlund [ 24/Oct/12 3:00 PM ]

> rather than supporting unquoted, multi-line cell values, I suggest supporting CSVs with just \return as the record-separator. Would you accept a patch for that?

Sounds good to me.

> As far as implementation, I believe this would require wrapping any Reader w/o markSupported in one that does

I think that's ok, since BufferedReader supports it.

Comment by Paul Stadig [ 10/Aug/15 7:52 AM ]

A patch by myself (Paul Stadig) and Nate Young. We both have CAs on file.

This patch will wrap any reader in a PushbackReader.

When parsing a CSV file, a single return character (ASCII 13) will be treated as a record separator.

We ran into this issue in production. Apparently on OSX if you export an Excel spreadsheet as CSV it will use return as a record separator. However, if you export it as a "Windows CSV" it will use CRLF. This is a bit too subtle for some users, and it would be preferable to be more flexible parsing record separators.

Comment by Jonas Enlund [ 10/Aug/15 12:12 PM ]

I released 0.1.3 with this fix. Thanks for the patch!





[DCSV-5] No option for parsing into maps Created: 21/May/13  Updated: 25/May/17  Resolved: 25/May/17

Status: Resolved
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Gary Fredericks Assignee: Jonas Enlund
Resolution: Declined Votes: 0
Labels: None


 Description   

I imagine a very common use case for parsing CSVs is to get the output as a sequence of maps. I'm happy to provide a patch for this but wanted to make sure I had the right design.

My initial idea is to add another option to read-csv with the name :headers which can be a sequence of values, or a flag such as :first-row. Presumably though we ought to also support using the first row as keywords rather than strings, so I'm not sure whether that ought to be another option or a different flag (e.g., :first-row-keywords).



 Comments   
Comment by Jonas Enlund [ 21/May/13 1:28 PM ]

I've seen this feature request before so I think that something like this should be added. One approach would be to provide a helper function:

(defn csv-data->maps [vecs]
  (map zipmap (repeat (first vecs)) (rest vecs)))

(csv-data->maps (read-csv reader))
Comment by Cees van Kemenade [ 24/May/13 12:41 PM ]

I've ran into the same question and prepared a small library to do my csv processing.
It uses data.csv as a workinghorse, but puts some additional functionality on top of it, such as:
1. csv-to-map: which does the same as the code above, but also maps strings in the first line to keywords. Furthermore, you can choose to translate the keys to lowercase, which is often needed when submitting the csv-data to a database
2. csv-columnMap: which does a selection of a subset of columns, renaming of these columns (aka renaming the first line of csv-data.
3. read-csv: my primary entry point using data.csv + csv-to-map + csv-columnMap
4. read-csv-lazy: A lazy variant which takes a processing function to be used in the inner loop (to allow large csv-datasets)
5. read-csv-to-db: pumping a csv into a database
6. map-seq-to-csv: mapping a uniform sequence of hashmaps to a dataset that can be written to a csv (first line contains the keys)

Feel free to reuse parts of the code. You can find the code here:

https://github.com/cvkem/vinzi.tools/blob/master/vinzi.tools/src/main/clojure/vinzi/tools/vCsv.clj

Comment by Jonas Enlund [ 25/May/17 1:25 PM ]

Instead of adding some helper function to achieve this I added some more docs to the README on how to do this





[DCSV-7] data.csv does not handle BOMs Created: 12/Aug/13  Updated: 25/May/17  Resolved: 25/May/17

Status: Resolved
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: John Walker Assignee: Jonas Enlund
Resolution: Declined Votes: 0
Labels: None
Environment:

Usually Windows (but also Linux)



 Description   

Sometimes BOMs are prepended to files in Microsoft Land. Data.csv does not handle this edge case, which causes the first field in the header of a csv file to be incorrect. This can be hard to detect, since \ufeff is usually invisible.

http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
http://www.fileformat.info/info/unicode/char/feff/index.htm



 Comments   
Comment by Jonas Enlund [ 12/Aug/13 11:46 PM ]

This isn't really a csv specific problem. I've encountered files with a byte order mark and then I have simply executed (.skip reader 1) before handing the reader over to read-csv. Is this not a good enough solution?

Comment by Jonas Enlund [ 25/May/17 1:26 PM ]

Instead of adding support for this, I added some docs on how to achieve it without changing data.csv





[DCSV-16] incorrect type hint on read-cell Created: 02/May/17  Updated: 03/May/17  Resolved: 03/May/17

Status: Resolved
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Steve Miner Assignee: Jonas Enlund
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File DCSV-16-type-hint.patch    
Patch: Code

 Description   

The type hints are incorrect on read-cell and read-quoted-cell. The Clojure compiler can't resolve the call to method unread. That's because the hint was ^Reader where it should be ^PushbackReader.

Compiling namespace clojure.data.csv
Reflection warning, clojure/data/csv.clj:34:25 - call to method unread on java.io.Reader can't be resolved (no such method).
Reflection warning, clojure/data/csv.clj:52:18 - call to method unread on java.io.Reader can't be resolved (no such method).



 Comments   
Comment by Steve Miner [ 02/May/17 12:46 PM ]

patch attached

Comment by Jonas Enlund [ 03/May/17 10:54 PM ]

Thanks. I applied the patch and released data.csv version 0.1.4.





[DCSV-3] Some minor documentation typos Created: 14/Jun/12  Updated: 15/Jun/12  Resolved: 15/Jun/12

Status: Resolved
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Trivial
Reporter: Trent Ogren Assignee: Jonas Enlund
Resolution: Completed Votes: 0
Labels: docs, documentation, typo

Attachments: Text File 0001-Documentation-typo-fixes.patch    
Patch: Code

 Description   

I found a couple minor typos: one in the README, one in a docstring. I've included a patch.






Generated at Mon May 29 22:48:46 CDT 2017 using JIRA 4.4#649-r158309.