Friday, June 21, 2013

CSV File Input

The CSV File Input step reads a delimited file format. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.

CSV File Input Options

The table below describes the options available for the CSV Input step:
OptionDescription
Step name Optionally, you can change the name of this step to fit your needs.
File Name Specify the name of the CSV file from which to read or select the field name that will contain the file name(s) from which to read. If your CSV Input step receives data from a previous step, this option is enabled as well as the option to include the file name in the output.
Delimiter Specify the file delimiter or separator used in the target file. This includes pipes, tabs, semicolons and so on. In the sample image, the delimiter is a semicolon.
Enclosure Specify the enclosure character used in the target file. It's possible that your strings contain semicolons or commas as delimiters, so the enclosures specify that a textual string inside an enclosure, such as a "quotation mark" is not to be parsed until the "end" enclosure. In the sample image, the enclosure is a quotation mark.
NIO buffer size The size of the read buffer. It represents the number of bytes that is read at one time from disk.
Lazy conversion Lazy conversion delays conversion of data as long as possible. In some instances, data conversion is prevented altogether. This can result in significant performance improvements when possible. The typical example that comes to mind is reading from a text file and writing back to a text file.
Header row present? Enable this option if the target file contains a header row containing column names. Header rows are skipped.
Add file name to result Adds the CSV filename(s) read to the result of this transformation. A unique list is being kept in memory that can be used in the next job entry in a job, for example in another transformation.
The row number field name (optional)The name of the Integer field that will contain the row number in the output of this step.
Running in parallel? Enable if you will have multiple instances of this step running (step copies) and if you want each instance to read a separate part of the CSV file(s).
When reading multiple files, the total size of all files is taken into consideration to split the workload. In that specific case, make sure that ALL step copies receive all files that need to be read, otherwise, the parallel algorithm will not work correctly (for obvious reasons).
Note: For technical reasons, parallel reading of CSV files is supported only for files that do not include fields with line breaks or carriage returns.
File Encoding Specify the encoding of the file being read.
Fields Table This table contains an ordered list of fields to be read from the target file.
Preview Click to preview the data coming from the target file.
Get Fields Click to return a list of fields from the target file based on the current settings (for example, Delimiter, Enclosure, and so on.). All fields identified will be added to the Fields Table.

No comments:

Post a Comment