Thursday, June 27, 2013

Microsoft Excel Input

This step imports data from a Microsoft Excel (2003 or 2007) or OpenOffice.org Calc spreadsheet file.
 
Note: The Files, Sheets, and Fields tabs are required for proper step configuration.

Files Tab

The Files tab defines basic file properties for this step's output.

OptionDescription
Step nameThe name of this step in the transformation workspace.
File or directoryThe name of the spreadsheet file or directory of files that you are reading from.
Regular ExpressionIncludes all files (in a given location) that meet the criteria specified by this regular expression.
Exclude Regular ExpressionExcludes all files (in a given location) that meet the criteria specified by this regular expression.
Selected filesA list of files that will be used in this step, according to the criteria specified in the previous fields.
Accept filenames from previous stepIf checked, will retrieve a list of filenames from the previous step in this transformation. You must also specify which step you are importing from, and the input field in that step from which you will retrieve the filename data. If you choose this option, the Show filename(s) option will show a preview of the list of filenames.
OptionDescription
Step nameThe name of this step in the transformation workspace.
File or directoryThe name of the spreadsheet file or directory of files that you are reading from.
Regular ExpressionIncludes all files (in a given location) that meet the criteria specified by this regular expression.
Exclude Regular ExpressionExcludes all files (in a given location) that meet the criteria specified by this regular expression.
Selected filesA list of files that will be used in this step, according to the criteria specified in the previous fields.
Accept filenames from previous stepIf checked, will retrieve a list of filenames from the previous step in this transformation. You must also specify which step you are importing from, and the input field in that step from which you will retrieve the filename data. If you choose this option, the Show filename(s) option will show a preview of the list of filenames.
Option
Description
Step name
The name of this step in the transformation workspace.
File or directory
The name of the spreadsheet file or directory of files that you are reading from.
Regular Expression
Includes all files (in a given location) that meet the criteria specified by this regular expression.
Exclude Regular Expression
Excludes all files (in a given location) that meet the criteria specified by this regular expression.
Selected files
A list of files that will be used in this step, according to the criteria specified in the previous fields.
Accept filenames from previous step
If checked, will retrieve a list of filenames from the previous step in this transformation. You must also specify which step you are importing from, and the input field in that step from which you will retrieve the filename data. If you choose this option, the Show filename(s) option will show a preview of the list of filenames.

OptionDescription
Step nameThe name of this step in the transformation workspace.
File or directoryThe name of the spreadsheet file or directory of files that you are reading from.
Regular ExpressionIncludes all files (in a given location) that meet the criteria specified by this regular expression.
Exclude Regular ExpressionExcludes all files (in a given location) that meet the criteria specified by this regular expression.
Selected filesA list of files that will be used in this step, according to the criteria specified in the previous fields.
Accept filenames from previous stepIf checked, will retrieve a list of filenames from the previous step in this transformation. You must also specify which step you are importing from, and the input field in that step from which you will retrieve the filename data. If you choose this option, the Show filename(s) option will show a preview of the list of filenames.
Sheets Tab
The Sheets tab specifies which worksheets you want to use in the specified files. A spreadsheet document can contain several worksheets.
Option
Description
List of sheets to read
A list of worksheets that you want to use. If this remains empty, all worksheets in all specified files will be selected. Rows and columns are numbered, starting with 0.
Get sheetname(s)
This button will retrieve a list of worksheets from all of the specified files and give you the option to select some or all of them for this step.
Content Tab
The content tab contains options for describing the file's content.
Option
Description
Header
Enable this option if there is a header row to skip in the selected worksheets.
No empty rows
If checked, removes empty rows from the output.
Stop on empty row
If checked, stops reading from the current worksheet when an empty row is read.
Limit
Sets a static number of rows to read. If set to 0, there is no set limit.
Encoding
Specifies the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings.
Spreadsheet type (engine)
Specifies which spreadsheet format to expect from the file, regardless of its extension.
Add filenames to result
If checked, passes the input filenames to the output.

Error Handling tab
This tab sets options for recording and reporting various error conditions.
Option
Description
Strict types?
If checked, PDI will report data type errors in the input.
Ignore errors?
If checked, no errors will be reported during input parsing.
Skip error lines?
If checked, PDI will skip lines that contain errors. These lines can be dumped to a separate file by specifying a path in the Failing line numbers files directory field below. If this is not checked, lines with errors will appear as NULL values in the output.
Warning files directory
Directory in which to create files that contain warning messages regarding input values for each spreadsheet file read. These files will have the extension you specify here.
Error files directory
Directory in which to create files that contain error messages regarding input values for each spreadsheet file read. These files will have the extension you specify here.
Failing line numbers files directory
Directory in which to create files that contain the lines that failed error checks during input validation. These files will have the extension you specify here.
Fields tab
The Fields tab defines properties for the exported fields.
Option
Description
Name
The name of the field.
Type
The field's data type; String, Date or Number.
Length
The length option depends on the field type. Number: total number of significant figures in a number; String: total length of a string; Date: determines how much of the date string is printed or recorded.
Precision
The precision option depends on the field type, but only Number is supported; it returns the number of floating point digits.
Trim type
Truncates the field (left, right, both) before processing. Useful for fields that have no static length.
Repeat
If set to Y, will repeat this value if the next field is empty.
Format
The format mask (number type).
Currency
Symbol used to represent currencies.
Decimal
A decimal point; this is either a dot or a comma.
Grouping
A method of separating units of thousands in numbers of four digits or larger. This is either a dot or a comma.
Additional output fields tab
This tab retrieves custom metadata fields to add to the step's output. The purpose of each field is defined in its name, but you can use these fields for whatever you want. Each item defines an output field that will contain the following information. Some of these are missing.
Option
Description
Full filename field
The full file name plus the extension.
Sheetname field
The worksheet name you're using.
Sheet row nr field
The current sheet row number.
Row nr written field
Number of rows written
Short filename field

Extension field
The three- or four-letter file type extension.
Path field

Size field

Is hidden field

Last modification field

URI field

Root URI field

No comments:

Post a Comment