CSV

CSV


View the latest, best formatted and most complete version of this manual online at http://mindprod.com/application/csv.manual.html.

Introduction

This package lets you read write, transform, align, change case, dedup, dump, entify, deentify, pack, patch, reshape, sort, template … comma, tab and semicolon- separated variable files, commonly known as CSV (Comma-Separated Value) files.

The csv 7.3 package consists of 30 command line utilities you can also embed as classes in your own programs. The functions include: CSV (statistics about a CSV file), CSVAlign, CSVChangeCase, CSVCommaToTab, CSVCondense, CSVDedup, CSVDeEntify, CSVDump, CSVEntify, CSVISBN, CSVPack, CSVPatch, CSVReader, CSVReplaceStrings, CSVReplaceURLs, CSVReshape, CSVScale, CSVSort, CSVSortField, CSVStripTags, CSVTabToComma, CSVTemplate, CSVToSRS, CSVToSRX, CSVToTable, CSVToUnique, CSVTuple, CSVWriter, LinesToCSV and TableToCSV. It also handles tab-separated and semicolon-separated files. This is the format use by Microsoft Word and other Microsoft products. This version now supports # embedded comments.

Complete Java source is provided.

You can use them either as standalone utilities or as classes you can embed in your own Java programs.

The CSV utility suite are Java command line utilities, not GUIs (Graphic User Interfaces) with menus. You use them at the command prompt. Clicking them in menus will not work. Just typing their names on the command line will not work. You must use them as shown in the examples below. They will not work unless you have installed a Java JRE (Java Runtime Environment).

Note that CSV files are perhaps 10 times slower to process than binary files. CSV files are for data interchange with other languages or when human-readability or editability is important. If you want speed, use binary format files, e.g. DataInputStream or possibly the convenient but slower ObjectInputStream.

The CSV utilities work on text files. You need a text editor to create and view them, not a word processor. e.g. notepad, Visual Slick Edit or other suitable text editor. You must use a monospaced Fonts (aka fixed pitch, aka programmer font) to view your files, or they won’t look aligned.

I found that by converting data to CSV format, then I can often no longer need to write any programs to massage my data. I just use combinations of the CSV utility suite. In particular, I used it in hundreds of ways to extract the data from myriad web pages, Excel spreadsheets, pdfs etc to collect the information about state, county and city sales taxes used by the AmericanTax program.

Format of CSV files

You can configure the separator characters, but the usual format of a CSV file looks like this:
## favourite, alts, quotation
apple, "pear, peach", "he said ""ouch"", then stopped" # sample comment

, separates fields. Fields may contain spaces or 's.
"’s enclose fields that contain ,s or "s.
"s are doubled that occur incidentally inside "s.
# starts a comment on the end of a line.
## is a special comment that provides names for the various fields separated by commas, no quotes allowed.

I found that when composing CSV files by hand, it is best to put in the quotes even when they are not needed. This has two benefits: If I accidentally leave out a comma, the parser will detect the error. If I fail to notice a comma embedded in a field, it will still work.

Summary of Components

Click on the name to get more information about each component. The package consists of:

Summary of CSV Utilities and Classes
Name Use as Class? Use as Utility? Purpose
CSV displays statistics about a CSV file, length in bytes, length in lines, number of comment lines, shortest line, longest line.
CSVReader reads a CSV file. Only of use to Java programmers.
CSVWriter writes a CSV file. Only of use to Java programmers.
CSVAlign aligns a CSV file in columns.
CSVChangeCase converts selected columns in a CSV file to lower, upper or book title case.
CSVCommaToTab converts a comma-separated file to a tab-separated file.
CSVCondense removes blank lines and condenses multiple spaces in fields to one space. Unlike CSVPack it removes entire blank lines.
CSVDedup removes “duplicate’ records in a CSV file.
CSVDeDupField removes duplicate (repeated) letters within a field of a CSV file. It does not deDup entire records. It is designed to tidy fields that are composed of option letters such as the HTMLMacros embellishment.csv file.
CSVDeEntify convert HTML (Hypertext Markup Language) entities back to UTF-8 characters.
CSVDump dumps out a CSV file on the console.
CSVEntify convert awkward characters to HTML entities.
CSVISBN convert ISBN10 to ISBN13 or ISBN13 to ISBN10 in selected columns.
CSVPack removes all excess spaces and quotes from a file. Unlike CSVCondense it does not remove blank lines.
CSVPatch patches/updates columns in a CSV file using a CSV file of from:to pairs.
CSVReplaceStrings Patch strings in text files using a list of from/to plain or regex strings to change and which files to change them in. This is not a global search/replace. It currently only patches a short list of files, not an whole directory tree. What you can do is use CSVToSRS or CSVToSRX and then use either the old Funduc S/R or the new BE Search Replace with the generated script.
CSVReplaceURLs Like CSVReplaceStrings, but specialised for replacing URLs (Uniform Resource Locators), usually from and to redirects.
CSVReshape add, remove, rearrange the order of fields in a CSV file.
CSVScale scale columns in a CSV file using a multiplier.
CSVSort sort a CSV file alphabetically, case/case insensitively, or numerically on multiple columns.
CSVSortField sorts duplicate (repeated) letters within a field of a CSV file. It does not sort entire records. It is designed to tidy fields that are composed of option letters such as the HTMLMacros embellishment.csv file.
CSVStripTags Strip HTML tag markup from specified columns.
CSVTabToComma converts a tab-separated file to a comma-separated file.
CSVToSRS converts a CSV file to a Funduc Search and Replace script for the old Search/Replace utilily.
CSVToSRX converts a CSV file to a Funduc Search and Replace script for the new BE edition of Search/Replace.
CSVTemplate Expands boilerplate with variable CSV data.
CSVToTable Converts a CSV file to an HTML table.
CSVToUnique It can be looked at as a multi-file DIFF for CSV files. It extracts the records unique to each file and also common to all the files you are comparing.
CSVTuple Breaks lines into fixed length tuples.
LinesToCSV converts a text file of lines into a CSV file blocking multiple lines into one CSV record.
TableToCSV Converts an HTML table to a UTF-8 CSV file with entities converted to Unicode and tags stripped.

CSV

CSV displays statistics about a CSV file such as length in bytes and lines, count of comment line, shortest and longest line.

java.exe com.mindprod.csv.CSV somefile.csv

alternatively:

csv.jar somefile.csv

or even:

csv.jar somefile.csv

and if you have Jet:

csv.exe somefile.csv

For this last technique to work, you must have the *.jar extension associated to invoke java.exe.

CSVReader

CSVReader reads a CSV file. CSVReader is of interest only to Java programmers. You don’t have to be a programmer to use the utilities in the csv package. Here is how you typically use CSVReader:

For more information, read the enclosed Javadoc or look at the documentation in the CSVReader and CSVWriter source on the constructors. Also look at the Java source and Javadoc for the various utilities that use CSVReader and CSVWriter.

Tips

CSVWriter

CSVWriter writes a CSV file. CSVWriter is of interest only to Java programmers. You don’t have to be a programmer to use the utilities in the csv package. Here is how some code that shows off some of the things you can do with a CSVWriter:

CSVAlign

CSVAlign aligns the fields in a CSV file in columns for easier proofreading. CSVAlign is done purely in RAM (Random Access Memory), so is not suitable for giant files. It does its job very quickly without any console output. It left aligns alpha data and right aligns numeric data. It does not align on decimal points.

java.exe com.mindprod.csv.CSVAlign somefile.csv

alternatively:

csvalign.jar somefile.csv

or even:

csvalign.jar somefile.csv

and if you have Jet:

csvalign.exe somefile.csv

For this last technique to work, you must have the *.jar extension associated to invoke java.exe.

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

There is a companion program called Align for aligning text files that are not CSV files. It breaks columns on commas or spaces, (except quoted fields) and discards the commas in the output.

CSVChangeCase

Change the case of selected fields in a CSV file. List the name of the file to be modified on the command line followed by the 0-based columns to change. After each column, insert either the letter l for lower case, u for upper case or t for Book Title Case. Don’t use CSVChangeCase on columns containing entities. That will just change the case of the entity, not the case of the character the entity represents. CSVDeEntify them first, then CSVChangeCase, then CSVEntify them again.

or alternatively

CSVCommaToTab

CSVCommaToTab Converts a comma-separated file to a tab-separated file.

Here is how you use it:

java.exe com.mindprod.csv.CSVCommaToTab somefile.csv

alternatively:

csvcommatotab.jar somefile.csv

The output overwrites the input file, so if you have any doubts it will work, make a backup copy first.

CSVDedup

Removes “duplicate’ records in a CSV file. Currently there is no way just to be notified of dups without eliminating them. To get that effect make a backup copy and compare it with the deduped version. In the simplest case you use it like this:

java.exe com.mindprod.csv.CSVDedup somefile.csv
rem ignore 0-based cols 3 and 4 when comparing for duplicates.
java.exe com.mindproc.csv.CSVDedup somefile.csv -keepfirst  3 4

alternatively:

csvdedup.jar somefile.csv -keeplast

If adjacent records in the file are identical, all but the first will be eliminated from the output file. If CSV file contains identical records, but they are not adjacent, they will be left intact. To eliminate them, you must sort the file first with CSVSort to arrange that duplicates are adjacent and the one you want to keep comes first. The comparison is case-sensitive, i.e. CSVDedup will not eliminate records that differ only in case.

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

It is also possible to eliminate records that are almost identical. You specify a list of 0-based columns to ignore when comparing adjacent records. If the other columns match, then the record is eliminated. You could use this for example to collapse a list of countries, cities, populations to a list of just countries, by sorting by country, then doing a CSVDedup asking it to ignore columns 1 and 2 so that records with the same country but different cities or populations would be considered duplicated. You could then use CSVReshape to chop off the remaining city and population columns.

java.exe com.mindprod.csv.CSVDedup countrycitypop.csv -delete 1 2

alternatively:

csv\csvdeup.jar countrycitypop.csv 1 2

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

Not the inconsistent spelling CSVDedup but CSVDeDupField.

CSVDedup also supports a DeDupStrategy parameter -keepfirst -keeplast and -delete:

CSVDeDupField

CSVDeDupField sorts a letters with in single field of a CSV file in ascending order using a case-sensitive sort, then it removes duplicate letters. It does not reorder the records. It both removes spaces and deDups. It does not remove records that partially match the previous record on some fields. You can get that effect thi th the CSVDedup utility. It probably not of wide interest. Here is what it does:

Before

After

CSVCondense

Like CSVPack, CSVCondense removes all excess spaces and quotes from a CSV file to make it as compact as possible, but in addition it removes blank lines and collapses multiple spaces inside fields down to one. Unlike CSVPack, it removes entire blank lines.

Here is how you use it:

java.exe com.mindprod.csv.CSVCondense somefile.csv

alternatively:

csvcondense.jar somefile.csv

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVDeEntify

CSVDeEntify converts HTML entities in a UTF-8 encoded CSV file to UTF-8 characters. You select which 0-based columns you want to be converted. The first column is column 0.

Here is how you use it:

alternatively:

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVDump

CSVDump dumps out the contents of the CSV file to the console. You might just as easily dump it out with a word processor or text editor. This is primarily a debugging tool to check that all your punctuation is correctly placed, since it will complain if the CSV file is malformed.

Here is how you use it:

java.exe com.mindprod.csv.CSVDump somefile.csv

alternatively:

csvdump.jar somefile.csv

It leaves the input file unchanged.

CSVEntify

CSVEntify converts awkward characters in a UTF-8 encoded CSV file to HTML entities. You select which 0-based columns you want to be converted. The first column is column 0.

Here is how you use it:

rem entify columns 0 4 and 5
java.exe com.mindprod.csv.CSVEntify somefile.csv 0 4 5

alternatively:

rem entify columns 0 4 and 5
csventify.jar somefile.csv  0 4 5

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVISBN

CSVISBN converts ISBN10 to ISBN13 or ISBN13 to ISBN10 in selected columns. You select which 0-based columns you want to be converted. The first column is column 0.

Here is how you use it:

rem convert ISBNs ISBN10->ISBN13 and ISBN13->ISBN10  in columns 0 4 and 5
java.exe com.mindprod.csv.CSVISBN somefile.csv 0 4 5

alternatively:

rem convert ISBNs ISBN10->ISBN13 and ISBN13->ISBN10  in columns 0 4 and 5
csvisbn.jar somefile.csv  0 4 5

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVPack

CSVPack removes all excess spaces and quotes from a CSV file to make it as compact as possible. Unlike CSVCondense, it does not remove blank lines.

Here is how you use it:

java.exe com.mindprod.csv.CSVPack somefile.csv

alternatively:

csvpack.jar somefile.csv

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVPatch

CSVPatch patches/updates 0-based columns in a CSV file using a CSV file of from:to pairs. It is like a column-limited, case-sensitive, multiple source-target pair, search/replace that operates on a single CSV file. You can think of it like a simple merge or update. Neither of the two files involved need be sorted.

Here is how you use it:

alternatively:

csvpatch.jar somefiletopatch.csv somefileoffromtopairs.csv 0 3 2

You specify the columns you want to patch. The column numbers are 0-based (the first column in column 0 ). Typically this would be used to update a column of URLs with new values for some of the URLS, The from-to pairs of old and new values live in the second file. They are applied to each field in the column in the first file to bring those values up to date. There must be a perfect case-insensitive match of the complete field for any replacement to take place. If the fields are URLs , then both must be in & form or plain & form to match properly. Fields are stripped of lead and trailing spaces before comparing/replacing. This utility is useful in conjunction with the program BrokenLinks to automatically refresh URLs in CSV files that have recently been redirected. The patch file must have at least two complete columns. If there are duplicates in column 0, the first duplicate will be the one used.

Both columns and the files must use a consistent &/& convention for & in URLs if you are replacing URLs in HTML text. CSVPatch just blindly replaces perfect matches. Similarly sometimes various utilities, browsers or servers interconvert spaces in URLs to/from %20. CSVPatch does not account for this. It insists on an exact match. If CVSPatch fails, look in the file and adjust either the patch file or the file to be patched so the strings match.

CSVPatch may be more useful than is at first apparent. For example, if you had a CSV file of information about countries, but the file did not yet include 2-letter country codes, but you had CSV file with pairs of countryname/country code, you could insert corresponding country codes this way:

  1. Use CSVReshape to duplicate the country column in your information file.
  2. If your file of country ⇒ country code is in the order country code then country, you will have to reshape it to reverse the order of the fields, and possibly prune off any extraneous fields.
  3. Use CSVPatch to translate one of the country columns into a country code using your pair file.

Any country not on your list, or countries with variant spellings or wordings, will be left as is for manual touchup.

CSVPress

This utility has not been written yet, but here is how it will work. It is like a winepress, to squeeze the data out of a file, to get just the juice and leave the junk behind.

You put three regexes on the command line start, collect and end. You follow that by a list of file or directory names or -s and then directory trees.

Everything up to and but not including the start regex match is ignored.

Everything after the last end regex match is ignored.

If there is no start regex, each file is processed right from the first byte.

If there is no end regex, the file is processed right to the last byte.

The middle part of the file is processed with the collect regex. It may match many times, even many times per line. Let us say the collect regex has three capturing groups (). Each of the fields captures will be written to a CSV file, one field per capture, one line per match.

If there are no matches for start or end in a file, it is ignored.

CSVReplaceStrings

CSVReplaceStrings patches/updates ordinary text files e.g. *.html, *.txt, *.java file using a CSV file of from:to pairs, followed by a list of filenames that need patching You can use plain or regex search/replace. CSVReplaceStrings patches text files; in contrast CSVPatch patches CSV files. CSVReplaceURLs is similar, but it is fine-tuned for replacing URL (Uniform Resource Locator) strings.

Here is how you use it:

java.exe com.mindprod.csv.CSVReplaceStrings commands.csv

alternatively:

csvreplacestrings.jar commands.csv

The only thing you specify on the command line is the name of the CSV file containing your search/replace commands. It contains commands like this:

##from, to, files... sorry, no wildcards
"flavor", "flavour", C:\essay\recipe.txt
"regex:(copper|gold)", metal, e:\essay\mining.html, e:\sales\buy.html

Note that you can specify on a line by line basis whether you want a plain or regex replace by preceding the search string with regex:.

Regexes are not for the faint of heart. Even experienced programmers will likely trip over the combination of CSV-style awkward character quoting (which is not the same as Java String conventions) and regex quoting with \. Unlike Java regexes, you do not double \ and you do double ". Matching is case-sensitive, unless you use a regex and embed (?i) to flip into case-insensitive mode.

CSVReplaceURLs

CSVReplaceURLs patches/updates ordinary text files, e.g. *.html, *.txt, *.java usually *.html file using a CSV file of from:to pairs, followed by a list of filenames that need patching. It is very similar to CSVReplaceStrings, except CSVReplaceURLs is specialised for replacing URLs . BrokenLinks provides such a file of permanent and temporary redirects it has discovered by probing all the links on your website. In contrast, CSVReplaceStrings patches text files; with ordinary or regex search/replace strings. in contrast CSVPatch patches CSV files, The CSVReplaceStrings specialisations for replacing URLs include:

Here is how you use it:

java.exe com.mindprod.csv.CSVReplaceurls commands.csv

alternatively:

csvreplaceurls.jar commands.csv

The only thing you specify on the command line is the name of the CSV file containing your search/replace commands. It contains commands like this:

Note that you can specify on a line by line basis whether you want a plain or regex replace by preceding the search string with regex:.

Regexes are not for the faint of heart. Even experienced programmers will likely trip over the combination of CSV-style awkward character quoting (which is not the same as Java String conventions) and regex quoting with \. Unlike Java regexes, you do not double \ and you do double ". Matching is case-sensitive, unless you use a regex and embed (?i) to flip into case-insensitive mode.

CSVReshape

CSVReshape adds, removes and rearranges the order of fields in a CSV file.

Here is how you use it:

java.exe com.mindprod.csv.CSVReshape somefile.csv 0 3 2

alternatively:

csvreshape.jar somefile.csv 0 3 2

You specify the columns you want retained in the order you want them to appear. The column numbers are 0-based (the first column in column 0). Any columns you do not mention will be dropped. You can duplicate columns. You can create new empty columns simply by using a column number larger than any column you used.

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVScale

CSVScale scale columns in a CSV file with a multiplier.

Use it like this:

java.exe com.mindprod.csv.CSVScale somefile.csv 100 2

alternatively:

csvscale.jar somefile.csv 100 2

On the command line, you give the file you wish to scale, the scale factor, usually 100, 10, 0.1 or 0.01. Then you give a list of the columns you want scaled, 0-based. It leaves the rest of the columns as is. The columns you want scaled must have decimal numbers in them.

CSVScale is clever about adding or removing decimal places just as a human would.

CSVSort

CSVSort sorts a CSV file alphabetically, case/case insensitively or numerically on multiple columns. It works by reading all the data into memory, sorting it, then writing it out. The size of the file it can sort it thus limited by your virtual RAM capacity.

Use it like this:

java.exe com.mindprod.csv.CSVSort somefile.csv 0s+ 2i- 5n+ 3x+

alternatively:

csvsort.jar somefile.csv 0s+ 2i- 5n+ 3x+

You give a list of the columns you want sort, 0-based.

s case-sensitive (lower case are sorted after upper case)
i case-insensitive (lower and upper case are intermixed as equivalent)
l sort by the length of the field rather than the contents. Longer fields sort later.
$ dollar, usually with a decimal point, with a lead $.
d double or float, usually with a decimal point, possibly with an exponent.
n numeric without a decimal point (long or int). Empty fields are treated as 0.
x hex (e.g. 08f6b or \ue7ff or 0x123456789abcdef up to 64-bits) without a decimal point.
f family (e.g. Charlie Brown) sorts by family/surname then given name.
+ ascending
- descending
99 0-based column number. First column in column 0.

Empty columns are treated as 0 or "". The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first. CSVSort preserves the position of # comments at the top and bottom of the file. # comments just prior to a data record are moved to stay just ahead of the data record in its new sorted position. Files need not be aligned prior to sorting, however, you will usually need to align them after sorting.

The sort is done purely in RAM, son is not suitable for giant files. The sort produces packed, not aligned files. If you want aligned, you will need to use CSVAlign on the sorted result.

CSVTabToComma

CSVTabToComma Converts a tab-separated file to a comma-separated file.

Here is how you use it:

java.exe com.mindprod.csv.CSVTabToComma somefile.csv

alternatively:

csvtabtocomma.jar somefile.csv

The output overwrites the input file, so if you have any doubts it will work, make a backup copy first.

CSVSortField

CSVSortField sorts a letters with in single field of a CSV file in ascending order using a case-sensitive sort. It does not reorder the records. It removes spaces, but does not deDup. It probably not of wide interest. You could think of it as like putting the Scrabble tiles for word in alphabetical order. Here is what it does:

Before

After

CSVToSRS

Converts a CSV file to a Funduc Search and Replace script. It converts a CSV file of string pairs, source/target to produce a corresponding *.srs file that you can tweak to create Funduc Search Replace Script to create a script to do bulk search/replaces over many files. It is of no use if you don’t own Funduc Search/Replace, other than as an example for your own utilities to export CSV data to other formats.

java.exe com.mindprod.csv.CSVToSRS somepairs.csv

alternatively:

csvtosrs.jar somepairs.csv

In this example, the output will be somepairs.srs in the same directory as somepairs.csv. You don’t explicitly specify the name of the output file. The input file is left unchanged.

CSVToSRX

Converts a CSV file to a Funduc Search and Replace script for the new BE edition. It converts a CSV file of string pairs, source/target to produce a corresponding *.srx file that you can tweak to create Funduc Search Replace Script to create a script to do bulk search/replaces over many files. It is of no use if you don’t own Funduc Search/Replace, other than as an example for your own utilities to export CSV data to other formats.

java.exe com.mindprod.csv.CSVToSRS somepairs.csv

alternatively:

csvtosrs.jar somepairs.csv

In this example, the output will be somepairs.srs in the same directory as somepairs.csv. You don’t explicitly specify the name of the output file. The input file is left unchanged.

CSVStripTags

CSVStripTags strips html markup tags, e.g. <br> . You select which 0-based columns you want to be converted. The first column is column 0.

Here is how you use it:

rem Strip HTML tags from columns 0 4 and 5
java.exe com.mindprod.csv.CSVStripTags somefile.csv 0 4 5

alternatively:

rem String HTML tages from columns 0 4 and 5
csvstriptags.jar somefile.csv 0 4 5

The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.

CSVTemplate

Expands boilerplate with variable CSV data. The output appears in a file with the same name as the input file and the same extension as the template file. Be careful. If you called files script.csv and script.txt the output would go into script.txt, overwriting your template. The template file is just a text file with strings of the form %0 %1… embedded in it. Each line expands the template replacing the %x string with the value of the corresponding column. The template may span several lines. The template is expanded once for each line in the CSV file, with all the expansions glued together in the output file.

or alternatively 

Here is a sample template that displays three ISBN (International Standard Book Number) numbers as links to amazon.com.

CSVToTable

Converts a CSV file to an HTML table. If you want fields stripped, entified or deentified, to it beforehand. It takes a *.csv file and produces a correspondingly named *.htmlfrag file.

or alternatively 
csvtotable.jar somefile.csv  alignedmiddle date  number ""
You can optionally provide css classes for the tr and each td column.

CSVToUnique

You could look at this as simple diff utility for CSV files.

rem compare 2 or more files
java.exe com.mindprod.CSVToUnique first.csv second.csv
rem generates three files  unique_first.csv unique_second.csv common_all.csv

You feed it a list of files, and it generates files containing the records that are unique to each file and also common to all the files. The original files are unchanged. Even if inputs have duplicates, the outputs will not. The records will be in the same order as the original. If used with only one file, will remove duplicates without sorting. The records in the common_all.csv file appear is all of the input files not just any two of them. If you renamed formatted *.java files as *.csv files mostly with lines with with a single field, (plus fiddling to make then conforming csv files) it will compare two programs, showing you which lines they have in common and which are unique even if the lines are in a completely different order. Ditto for a word processing essay or text files, where you want to merge two similar documents.

CSVTuple

Converts a CSV file into fixed length tuples. If your file looked like this, and you asked for tuples of length 4 you would see this transformation:

a,b,c,d,e,f
x,y

becomes:
a,b,c,d
a,b,c,e
a,b,c,f
x,y,,

java.exe com.mindprod.csv.CSVTuple somefile.csv 4
or alternatively 
csvtuple.jar somefile.csv 4

You can specify how many fields you want included in each tuple. Short lines are padded with empty fields.

LinesToCSV

converts a text file of lines into a CSV file blocking multiple lines into one CSV record.

java.exe com.mindprod.csv.LinesToCSV somefile.txt 3

alternatively:

linestocsv.jar somefile.txt 3

You specify the number of lines that should be grouped together as fields on a single CSV file line. The text file should contain an even multiple of this number of lines.

Note that the input file is not a CSV file, but an ordinary *.txt text file. The output is a CSV file with a matching name, e.g. somefile.csv in this example.

Let us say the input file somefile.txt looked like this: The first line is the name of a species, the second is the number of legs, and the third, the colour. This pattern of three lines repeats for each animal.

lion
4
brown
ostrich
2
black and white
zebra
4
striped black and white

The output somefile.csv would look like this:

lion,4,brown
ostrich,2,black and white
zebra,4,striped black and white

TableToCSV

Converts a HTML file containing tables to a UTF-8 CSV with entities converted to their equivalent Unicode and tags stripped. It takes a *.html file and produces a correspondingly named *.csv file.

java.exe com.mindprod.csv.TableToCSV somefile.html
or alternatively 
tabletocsv.jar somefile.html

Futures

Implement CSVPress.

##n comments to label each column, that get renumbered on reshaping.

Command line options to let you override the comment character (#), quote character (") and separator (,).

Eventually I would like to figure out a way to control all the utilities using field names rather than field numbers. It takes two passes though the input, but does not require holding all the data in RAM, thus it can handle arbitrarily large files.

Acquiring CSV

PackageVersionReleasedLicenceLanguageNotes 
csv
CSVReader/Writer
7.4 2014-09-14 free Java
more infoprecismanualscreenshotbrowse source repository
for the current version of CSVReader/Writer.
Classes to read, write, align, condense, sort, convert, export, pack… CSV (comma-separated value) files.
download 2.9MB zip for CSVReader/Writer Java source, compiled class files, jar and documentation to run on your own machine as an application.

Runs on any OS that supports Java e.g. W2K, XP, W2003, Vista, W2008, W7-32, W7-64, W8-32, W8-64, W2012, Linux, LinuxARM, LinuxX86, LinuxX64, Ubuntu, Solaris, SolarisSPARC, SolarisSPARC64, SolarisX86, SolarisX64 and OSX.

First install the most recent Java.

To install, extract the zip download with WinZip, (or similar unzip utility) into any directory you please, often J:\ — ticking off the use folder names option.

To check out the corresponding source from the Subversion repository, use the TortoiseSVN repo-browser to
access csv source in repository with [Tortoise] Subversion client on wush.net/svn/mindprod/com/mindprod/csv/.

After you have installed the jar, you can run it as an application. Type:

java -jar J:\com\mindprod\csv\csv.jar parms

adjusting as necessary to account for where the jar file is.

download ASP PAD XML program description for the current version of CSVReader/Writer.

$1989.00 US donated so far. If the CMP utilities solved your problem, please donate a buck or two, or donate to one of the charities featured in the footer public service ads throughout the website and get a tax receipt.

CSVReader/Writer is free. Full source included. You may even include the source code, modified or unmodified in free/commercial open source/proprietary programs that you write and distribute. Non-military use only.
 
 
case-sensitive
download CSV
HTML Broken link fixer student project
monospaced Fonts
text editor

This page is posted
on the web at:

http://mindprod.com/application/csv.manual.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\application\csv.manual.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.
no blog for this page
IP:[65.110.21.43]
Your face IP:[54.205.207.53]
You are visitor number