<< Back to previous view

[DCSV-10] Specify RFC4180 compatibilty in README Created: 18/Mar/15  Updated: 19/Mar/15

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Leon Grapenthin Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: documentation


 Description   

In the README it says: "Follows the RFC4180 specification but is more relaxed."
This is an oxymoron and confusing in other regards. E.g.:

  • What does "relaxed" mean?
  • If it is more "relaxed" than the specification, how can it follow it?
  • Does it follow the specification, or only parts of it?

Problem: If I use this lib to generate CSV for a third party, can I say "This is RFC4180 conform CSV" and feel safe with it? Or should I add "but it is more relaxed"

The task could be to add more specific explanation or a comparison table if necessary.



 Comments   
Comment by Jonas Enlund [ 18/Mar/15 10:54 AM ]

"relaxed" means it will read some files that does not adhere to the RFC4180 spec. Files written with write-csv will follow the spec. If this is not the case it should be considered a bug.

Comment by Leon Grapenthin [ 19/Mar/15 5:13 AM ]

Thanks for the explanation.
Then it should be pointed out in which regards read CSVs don't need to adhere to the spec and whether a strict mode exists or is planned and whether it is or will or would be more or less performant.

P.S.: Out of curiosity - Is this definition of relaxed some kind of standard in IT? I googled for it, but couldn't find anything related.

Comment by Jonas Enlund [ 19/Mar/15 5:33 AM ]

According to the RFC4180 spec:

  • the lines should end with CRLF, this library also supports only LF as well
  • cells should be separated with commas and this lib also supports other separators

I don't think "relaxed" is a standard term. I would certainly accept a patch that enhances the documentation.





[DCSV-7] data.csv does not handle BOMs Created: 12/Aug/13  Updated: 12/Aug/13

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: John Walker Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Usually Windows (but also Linux)



 Description   

Sometimes BOMs are prepended to files in Microsoft Land. Data.csv does not handle this edge case, which causes the first field in the header of a csv file to be incorrect. This can be hard to detect, since \ufeff is usually invisible.

http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
http://www.fileformat.info/info/unicode/char/feff/index.htm



 Comments   
Comment by Jonas Enlund [ 12/Aug/13 11:46 PM ]

This isn't really a csv specific problem. I've encountered files with a byte order mark and then I have simply executed (.skip reader 1) before handing the reader over to read-csv. Is this not a good enough solution?





[DCSV-6] read-csv can not handle white-space at end of line Created: 24/May/13  Updated: 29/May/14

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Cees van Kemenade Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None


 Description   

When whitespace is present after the closing \" the clojure reader crashes with a weird error.
It took me some time to notice it was a white-space issue as whitespace is .... not visible.

See an example of the error below.

=> (read-csv (java.io.StringReader. "\"a\" " ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)
=> (read-csv (java.io.StringReader. "\"a\"" ))
(["a"])



 Comments   
Comment by Cees van Kemenade [ 24/May/13 4:35 PM ]

To take the issue a little further, the same holds for whitespace in the middle of a line between the closing-quote and the separator, see:
=> (read-csv (java.io.StringReader. "\"a\" , 5\n \"b,b\",\"6\"" ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)

This raises the question what happens if you put a space between the separator and the opening quote (first the default case):
=> (read-csv (java.io.StringReader. "\"a\", 5\n\"b\",\"6\"" ))
(["a" " 5"] ["b" "6"])

Now adding one additional space:
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b\",\"6\"" ))
(["a" " 5"] [" \"b\"" "6"])

Interesting, the white-space is considered to be the start of the string and the quote that follows is considered to be part of the tekst-value that is read.
The main reason for using quotes is to allow separators in text, so let us see that happens if we extend the string by putting a separator in it.
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b,b\",\"6\"" ))
(["a" " 5"] [" \"b" "b\"" "6"])

Now we see that the separator is not quoted anymore and as expect, the line is interpreted to contain three values instead of two values.

When using standard libraries the issues mentioned above usually do not appear. However, in custom code that emits csv-files or when doing small manual fixes in a csv it is easy to introduce such an issue/error and subsequently it is quit tough to analyse this issue correctly.
Therefore I would opt for a mode of operation where white-space before an opening-quote or after a closing quote are considered to be void (unless it is an escaped quote like "").

Comment by Paul Schulz [ 29/May/14 9:29 AM ]

This is related to DSCV-8

A quote at the beginning of the string, and ending in the middle of the string (eg. where additional characters appear after second quote) will cause the same problem.





[DCSV-5] No option for parsing into maps Created: 21/May/13  Updated: 24/May/13

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Gary Fredericks Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None


 Description   

I imagine a very common use case for parsing CSVs is to get the output as a sequence of maps. I'm happy to provide a patch for this but wanted to make sure I had the right design.

My initial idea is to add another option to read-csv with the name :headers which can be a sequence of values, or a flag such as :first-row. Presumably though we ought to also support using the first row as keywords rather than strings, so I'm not sure whether that ought to be another option or a different flag (e.g., :first-row-keywords).



 Comments   
Comment by Jonas Enlund [ 21/May/13 1:28 PM ]

I've seen this feature request before so I think that something like this should be added. One approach would be to provide a helper function:

(defn csv-data->maps [vecs]
  (map zipmap (repeat (first vecs)) (rest vecs)))

(csv-data->maps (read-csv reader))
Comment by Cees van Kemenade [ 24/May/13 12:41 PM ]

I've ran into the same question and prepared a small library to do my csv processing.
It uses data.csv as a workinghorse, but puts some additional functionality on top of it, such as:
1. csv-to-map: which does the same as the code above, but also maps strings in the first line to keywords. Furthermore, you can choose to translate the keys to lowercase, which is often needed when submitting the csv-data to a database
2. csv-columnMap: which does a selection of a subset of columns, renaming of these columns (aka renaming the first line of csv-data.
3. read-csv: my primary entry point using data.csv + csv-to-map + csv-columnMap
4. read-csv-lazy: A lazy variant which takes a processing function to be used in the inner loop (to allow large csv-datasets)
5. read-csv-to-db: pumping a csv into a database
6. map-seq-to-csv: mapping a uniform sequence of hashmaps to a dataset that can be written to a csv (first line contains the keys)

Feel free to reuse parts of the code. You can find the code here:

https://github.com/cvkem/vinzi.tools/blob/master/vinzi.tools/src/main/clojure/vinzi/tools/vCsv.clj





[DCSV-9] write-csv and quote? predicate Created: 21/Oct/14  Updated: 21/Oct/14

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Frank Stebich Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None


 Description   

In version 0.1.2 the quote? predicate is called after the object to be written into a cell is converted into a string (see line 99). If the predicate quote? would be applied to the object instead, function write-csv could be called as follows:

(write-csv
"test.csv"
[[1 "text"]
[2 "text"]]
:quote string?)

In the current version every cell value is a string.






[DCSV-8] Allow read-csv to read files without quoting. Created: 29/May/14  Updated: 29/May/14

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Paul Schulz Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None


 Description   

I would like to be able to read file with the following format:

  • '|' separated
  • Unquoted.. eg. \" can appear in the strings, in particular
    at the beginning, and not at the end.

I need to set a nul quote character, but this doesn't currently work.
The following is a workaround, where a '.' is unlikely to appear in first
character of the sting.

(csv/read-csv in-file :separator | :quote \.))

I would like to be able to be explicit:

(csv/read-csv in-file :separator | :quote nul))






[DCSV-1] pom.xml directives Created: 10/Feb/12  Updated: 26/Jul/13

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Giorgio Valoti Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Apache Maven 3.0.3 (r1075438; 2011-02-28 18:31:09+0100)
Maven home: /usr/share/maven
Java version: 1.6.0_29, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: it_IT, platform encoding: MacRoman
OS name: "mac os x", version: "10.7.2", arch: "x86_64", family: "mac"


Attachments: XML File pom.xml    
Patch: Code and Test

 Description   

If you build data.csv alone with the current pom.xml you get a couple of warnings and test are not executed. With the recent versions of Maven, these warnings can break the build.

A fixed (I hope!) version is attached.






Generated at Thu Sep 03 10:27:06 CDT 2015 using JIRA 4.4#649-r158309.