data.csv does not handle BOMs

Description

Sometimes BOMs are prepended to files in Microsoft Land. Data.csv does not handle this edge case, which causes the first field in the header of a csv file to be incorrect. This can be hard to detect, since \ufeff is usually invisible.

http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
http://www.fileformat.info/info/unicode/char/feff/index.htm

Environment

Usually Windows (but also Linux)

Activity

Show:

Dimitrios Jim PiliourasJune 28, 2019 at 10:44 PM

There is a Clojure native lib that does this (https://github.com/jimpil/clj-bom). Since you’re mentioning org.apache.commons.io.input/BOMInputStream`, you may want to mention that too.

Jonas EnlundMay 25, 2017 at 7:26 PM

Instead of adding support for this, I added some docs on how to achieve it without changing data.csv

Jonas EnlundAugust 13, 2013 at 5:46 AM

This isn't really a csv specific problem. I've encountered files with a byte order mark and then I have simply executed (.skip reader 1) before handing the reader over to read-csv. Is this not a good enough solution?

Declined

Details

Assignee

Reporter

Priority

Created August 13, 2013 at 12:07 AM
Updated June 28, 2019 at 10:44 PM
Resolved May 25, 2017 at 7:26 PM

Flag notifications