% Generated by roxygen2 (4.0.1): do not edit by hand
\name{fixCSV}
\alias{fixCSV}
\title{Tidy a comma separated value (CSV) file}
\usage{
fixCSV(file, skip = 0, overwrite = FALSE)
}
\arguments{
\item{file}{character: the name of the CSV file to be
\sQuote{fixed}.}

\item{skip}{integer: the number of lines in the CSV file to skip
before the header row of the table.  The skipped lines are copied
directly to the output file unchanged.  The default is
\code{skip=0}, implying that the header row is the first row of
the CSV file.}

\item{overwrite}{logical: Write output to a separate,
\sQuote{FIXED} file (\code{overwrite=FALSE}, the default), or
overwrite the original file (\code{overwrite=TRUE})?  If
\code{overwrite=TRUE}, the original file is copied to a
\code{.BAK} file before being overwritten.}
}
\description{
Tidies up a Comma Separated Value (CSV) file, ensuring that each
row of the table in the file contains the same number of commas,
and no empty rows are left below the table.
}
\details{
\code{fixCSV} tidies up a Comma Separated Value (CSV) file
to ensure that the CSV file contains a strictly rectangular block
of data for input into R (ignoring any preliminary comment rows
via the \code{skip=} argument).

CSV formatted files are a plain text file format for tabular data,
in which cell entries in the same row of a table are separated by
commas.  When such files are exported from other applications such
as spreadsheet software, the software has to decide whether any
empty cells to the right-hand side of, or below, the table or
spreadsheet should be represented by trailing commas in the CSV
file.  Such decisions can result in a \sQuote{ragged} table in the
CSV file, in which some rows contain fewer commas (\sQuote{short
rows}) or more commas (\sQuote{long rows}) than others, or where
empty rows below the table are included as comma-only rows in the
CSV file.

While R's \code{\link{read.table}} and related functions can
sensibly extend short rows as needed, ragged tables in a CSV file
can still result in errors, unwanted empty rows (below the table)
or unwanted columns (to the right of the table) when the data is
loaded into R.

\code{fixCSV} reads in a specified CSV file and removes or adds
commas to rows, to ensure that each row in the body of the table
contains the same number of cells as the header row of the table.
Any empty rows below the table are also removed.  The resulting
table is then written back to file, either to a new file with
\sQuote{FIXED} added to the filename (argument
\code{overwrite=FALSE}, the default) or overwriting the original
file (\code{overwrite=TRUE} - the original file is copied to a
\code{.BAK} file before being overwritten).

Note that:
\itemize{

\item The table of data in the CSV file \emph{must} contain a
header row of the correct length, since this row is used to
determine the correct number of columns for the table.  Note: if
this header row is too short, then subsequent rows will be
truncated to match the length of the header, so beware.
Misspecification of the \code{skip=} argument (see below) can
similarly lead to such corruption of the \sQuote{fixed} file.

\item In the header row, any trailing commas representing empty
cells to the right of the (non-empty) header entries are first
removed before determining the correct number of columns for the
table.  Thus the length of the header row (and hence the assumed
width of the entire table) is determined by the \emph{right-most
non-empty cell} in the header row.

\item \code{fixCSV} does not remove empty cells, rows or columns
within the interior (or on the left side) of the table - it is
concerned only with the right and bottom boundaries of the table.

\item A \code{skip=} argument is included to tell \code{fixCSV} to
ignore the specified number of comment rows preceding the header
row.  Such rows are simply copied over into the output file
unchanged.  The default for this parameter is \code{skip=0}, so
that the first row in the data file is assumed to be the header
row. As noted above, misspecification of this argument can
seriously corrupt the output.

\item \code{fixCSV} can overwrite your data file(s) (via
\code{overwrite=TRUE}), and althought it makes a backup of your
original file, you should still make sure that you have a separate
backup of your data file in a safe place before using this
function!  The author of this code takes no responsibility for any
data loss or corruption as a result of the use of this routine...

}
}
\examples{
\dontrun{

## Assuming CSV file 'alleleDataFile.csv' exists in the current
## directory.  The following overwrites the CSV file - make sure
## you have a backup!

fixCSV("alleleDataFile.csv",overwrite=TRUE)

}
}
\author{
Alexander Zwart (alec.zwart at csiro.au)
}

