<< Back to previous view

[DCSV-5] No option for parsing into maps Created: 21/May/13  Updated: 24/May/13

Status: Open
Project: data.csv
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Gary Fredericks Assignee: Jonas Enlund
Resolution: Unresolved Votes: 0
Labels: None


I imagine a very common use case for parsing CSVs is to get the output as a sequence of maps. I'm happy to provide a patch for this but wanted to make sure I had the right design.

My initial idea is to add another option to read-csv with the name :headers which can be a sequence of values, or a flag such as :first-row. Presumably though we ought to also support using the first row as keywords rather than strings, so I'm not sure whether that ought to be another option or a different flag (e.g., :first-row-keywords).

Comment by Jonas Enlund [ 21/May/13 1:28 PM ]

I've seen this feature request before so I think that something like this should be added. One approach would be to provide a helper function:

(defn csv-data->maps [vecs]
  (map zipmap (repeat (first vecs)) (rest vecs)))

(csv-data->maps (read-csv reader))
Comment by Cees van Kemenade [ 24/May/13 12:41 PM ]

I've ran into the same question and prepared a small library to do my csv processing.
It uses data.csv as a workinghorse, but puts some additional functionality on top of it, such as:
1. csv-to-map: which does the same as the code above, but also maps strings in the first line to keywords. Furthermore, you can choose to translate the keys to lowercase, which is often needed when submitting the csv-data to a database
2. csv-columnMap: which does a selection of a subset of columns, renaming of these columns (aka renaming the first line of csv-data.
3. read-csv: my primary entry point using data.csv + csv-to-map + csv-columnMap
4. read-csv-lazy: A lazy variant which takes a processing function to be used in the inner loop (to allow large csv-datasets)
5. read-csv-to-db: pumping a csv into a database
6. map-seq-to-csv: mapping a uniform sequence of hashmaps to a dataset that can be written to a csv (first line contains the keys)

Feel free to reuse parts of the code. You can find the code here:


Generated at Wed Jan 18 10:31:21 CST 2017 using JIRA 4.4#649-r158309.