lazy-csvSource codeContentsIndex
Text.CSV.Lazy.ByteString
Contents
CSV types
CSV parsing
Pretty-printing
Conversion between standard and simple representations
Selection, validation, and algebra of CSV tables
Description

The CSV (comma-separated value) format is defined by RFC 4180, "Common Format and MIME Type for Comma-Separated Values (CSV) Files", http://www.rfc-editor.org/rfc/rfc4180.txt

This lazy parser can report all CSV formatting errors, whilst also returning all the valid data, so the user can choose whether to continue, to show warnings, or to halt on error.

Valid fields retain information about their original location in the input, so a secondary parser from textual fields to typed values can give intelligent error messages.

In a valid CSV file, all rows must have the same number of columns. This parser will flag a row with the wrong number of columns as a error. (But the error type contains the actual data, so the user can recover it if desired.) Completely blank lines are also treated as errors, and again the user is free either to filter these out or convert them to a row of actual null fields.

Synopsis
type CSVTable = [CSVRow]
type CSVRow = [CSVField]
data CSVField
= CSVField {
csvRowNum :: !Int
csvColNum :: !Int
csvTextStart :: !(Int, Int)
csvTextEnd :: !(Int, Int)
csvFieldContent :: !ByteString
csvFieldQuoted :: !Bool
}
| CSVFieldError {
csvRowNum :: !Int
csvColNum :: !Int
csvTextStart :: !(Int, Int)
csvTextEnd :: !(Int, Int)
csvFieldError :: !String
}
data CSVError
= IncorrectRow {
csvRow :: Int
csvColsExpected :: Int
csvColsActual :: Int
csvFields :: [CSVField]
}
| BlankLine {
csvRow :: !Int
csvColsExpected :: !Int
csvColsActual :: !Int
csvField :: CSVField
}
| FieldError {
csvField :: CSVField
}
| NoData
type CSVResult = [Either [CSVError] [CSVField]]
csvErrors :: CSVResult -> [CSVError]
csvTable :: CSVResult -> CSVTable
parseCSV :: ByteString -> CSVResult
parseDSV :: Bool -> Char -> ByteString -> CSVResult
ppCSVError :: CSVError -> String
ppCSVField :: CSVField -> String
ppCSVTable :: CSVTable -> ByteString
ppDSVTable :: Char -> CSVTable -> ByteString
fromCSVTable :: CSVTable -> [[ByteString]]
toCSVTable :: [[ByteString]] -> ([CSVError], CSVTable)
selectFields :: [String] -> CSVTable -> Either [String] CSVTable
expectFields :: [String] -> CSVTable -> Either [String] CSVTable
mkEmptyColumn :: String -> CSVTable
joinCSV :: CSVTable -> CSVTable -> CSVTable
CSV types
type CSVTable = [CSVRow]Source
A CSV table is a sequence of rows. All rows have the same number of fields.
type CSVRow = [CSVField]Source
A CSV row is just a sequence of fields.
data CSVField Source
A CSV field's content is stored with its logical row and column number, as well as its textual extent. This information is necessary if you want to generate good error messages in a secondary parsing stage, should you choose to convert the textual fields to typed data values.
Constructors
CSVField
csvRowNum :: !Int
csvColNum :: !Int
csvTextStart :: !(Int, Int)
csvTextEnd :: !(Int, Int)
csvFieldContent :: !ByteString
csvFieldQuoted :: !Bool
CSVFieldError
csvRowNum :: !Int
csvColNum :: !Int
csvTextStart :: !(Int, Int)
csvTextEnd :: !(Int, Int)
csvFieldError :: !String
show/hide Instances
CSV parsing
data CSVError Source
A structured error type for CSV formatting mistakes.
Constructors
IncorrectRow
csvRow :: Int
csvColsExpected :: Int
csvColsActual :: Int
csvFields :: [CSVField]
BlankLine
csvRow :: !Int
csvColsExpected :: !Int
csvColsActual :: !Int
csvField :: CSVField
FieldError
csvField :: CSVField
NoData
show/hide Instances
type CSVResult = [Either [CSVError] [CSVField]]Source
The result of parsing a CSV input is a mixed collection of errors and valid rows. This way of representing things is crucial to the ability to parse lazily whilst still catching format errors.
csvErrors :: CSVResult -> [CSVError]Source
Extract just the errors from a CSV parse.
csvTable :: CSVResult -> CSVTableSource
Extract just the valid portions of a CSV parse.
parseCSV :: ByteString -> CSVResultSource
A first-stage parser for CSV (comma-separated values) data. The individual fields remain as text, but errors in CSV formatting are reported. Errors (containing unrecognisable rows/fields) are interspersed with the valid rows/fields.
parseDSV :: Bool -> Char -> ByteString -> CSVResultSource
Sometimes CSV is not comma-separated, but delimiter-separated values (DSV). The choice of delimiter is arbitrary, but semi-colon is common in locales where comma is used as a decimal point, and tab is also common. The Boolean argument is whether newlines should be accepted within quoted fields. The CSV RFC says newlines can occur in quotes, but other DSV formats might say otherwise. You can often get better error messages if newlines are disallowed.
Pretty-printing
ppCSVError :: CSVError -> StringSource
ppCSVField :: CSVField -> StringSource
Pretty-printing for CSV fields, shows positional information in addition to the textual content.
ppCSVTable :: CSVTable -> ByteStringSource
Output a table back to a lazily-constructed string. There are lots of possible design decisions one could take, e.g. to re-arrange columns back into something resembling their original order, but here we just take the given table without looking at Row and Field numbers etc.
ppDSVTable :: Char -> CSVTable -> ByteStringSource
Output a table back to a lazily-constructed bytestring, using the given delimiter char.
Conversion between standard and simple representations
fromCSVTable :: CSVTable -> [[ByteString]]Source
Convert a CSV table to a simpler representation, by dropping all the original location information.
toCSVTable :: [[ByteString]] -> ([CSVError], CSVTable)Source
Convert a simple list of lists into a CSVTable by the addition of logical locations. (Textual locations are not so useful.) Rows of varying lengths generate errors. Fields that need quotation marks are automatically marked as such.
Selection, validation, and algebra of CSV tables
selectFields :: [String] -> CSVTable -> Either [String] CSVTableSource
Select and/or re-arrange columns from a CSV table, based on names in the header row of the table. The original header row is re-arranged too. The result is either a list of column names that were not present, or the (possibly re-arranged) sub-table.
expectFields :: [String] -> CSVTable -> Either [String] CSVTableSource
Validate that the columns of a table have exactly the names and ordering given in the argument.
mkEmptyColumn :: String -> CSVTableSource
A generator for a new CSV column, of arbitrary length. The result can be joined to an existing table if desired.
joinCSV :: CSVTable -> CSVTable -> CSVTableSource
A join operator, adds the columns of two tables together. Precondition: the tables have the same number of rows.
Produced by Haddock version 2.4.2