CSV (Comma-Separated Value). A file of
ASCII (American Standard Code for Information Interchange)
fields separated by commas. Microsoft Word, Microsoft Excel, and
SQL (Standard Query Language) can often import some variant on this format.
"orange, Valencia", lemon, lime
"""extra virgin"" olive", palm, date
Usually fields containing embedded spaces or commas are contained in " marks, but there are other conventions. Quotes (") inside quoted fields are doubled. Europeans often use
; and Perl aficionados use tab to separate fields instead
of commas. Sybase SQL import uses ' instead of
I wrote CSVReader and CSVWriter which are available with complete Java source. They are full featured and configurable. As
well as the read/write classes there are 20 utilities to
let you do such thisgs as sort, align, pack, etc. with the library or with the sample
utilities. I find these utilities useful for massaging data into tidied form, e.g. by
screenscaping, without having to write any Java code.
For simple key=value you might use the built-in Properties mechanism instead. Unfortunately, it has a
complex system of encoding awkward characters incompatible with
- This is the advantage I appreciate most. You can use a suite of standard
utilities to manipulate them. If you put your data in this format, it is amazing
how much you can do with the CMP (Canadian Mind Products)
without writing a line of code: align in columns, change case, condense, dedup,
dump, convert to entities, convert entities to UTF-8, pack, replace, reshape,
sort, covert tabs to comma, create csv files using flat files and a template,
convert to a search replace script, convert a flat file to
CSV to a flat
file, convert a CSV file to an HTML (Hypertext Markup Language)
table, convert an HTML
table to CSV, create a CSV
file from groups of lines.
- They are human readable.
- They are compact, at least relative to XML (extensible Markup Language).
- THey are widely supported, albeit with a variety of separator characters.
- You can embed comments in them.
- You can safely view and edit them with an ordinary text editor.
- You can easily convert them to HTML
tables and back.
- If they are damaged, you can manually repair them.
- They don’t have a way of encoding unprintable characters or binary.
- They can be encoding with many different charsets. There is nothing in the file
itself to tell you which one.