When to use this format
CSV (or comma-separated values) is one of the most commonly used data exchange formats. In Integrator, you can actually define what character is used as a separator between values and lines, as well as other parameters.
Use CSV format when configuring a source-to-destination transformation that reads or writes CSV files.
To create a new CSV format, go to Connections, select Formats tab, click Add Format button and type in csv in the search field.
Below are the available parameters:
- Delimiter - a character used as a delimiter between values. The default is comma:
- Enclosure Character- a character used to enclose fields in. By default, only the fields that contain Delimiter will be enclosed. Enable Always enclose to enclose all fields.
- Always enclose - if this option enabled, the system will always enclose fields in quotes (assuming that the Enclosure Character is configured). The default behavior is to enclose fields only if they contain the delimiter character.
- Escape double-quotes - if double-quotes are used to enclose fields, then a double-quote appearing inside a field will be escaped by preceding it with another double quote.
- Line Separator - a character used as a separator between lines.
- Default Extension - the default extension is used when the file name doesn't have an extension. If not entered,
datis the default extension.
- Has multiline records - enable this if you expect to have multi-line records in the source document. It only works if Quote is set.
- Ignore Byte order mark character (BOM) - uncheck this option if you do not want the parser to ignore the byte order mark character (BOM). The default is ignore.
- Template - a template in the CSV format. If this field is not empty, Integrator will use it to populate column names and data types. Template is an optional field. Example:
- Use First Row for Data - if checked, it is assumed that the file doesn't have a "fields" header.
- Column names compatible with SQL - this converts column names to SQL compatible column names by removing all characters except alphanumeric and spaces.
- Skip First Row - if this option is enabled, the system skips the first row in a file. Typically, this is used together with Use First Row for Data.
- Skip not Properly Formatted Rows - if this option is enabled, the system skips rows which do not conform to the CSV format. For example, a row might have a different number of columns than other rows do.
- Skip Empty Rows - sometimes CSV files contain completely empty rows with no values and delimiters. Integrator can be configured to skip these rows. Otherwise, it will generate an exception when reading such a file.
- Skip rows with fewer columns than header - if this option is enabled, the system skips rows that have fewer columns than a header row.
- Skip rows with more columns than header - if this option is enabled, the system skips rows that have more columns than a header row.
- Document has extra data columns - if this option is enabled, Integrator will be able to read CSV documents, even if the number of header columns is less than the number of data columns.
- Enforce number of data columns - if option is enabled, Integrator will enforce the number of data columns by setting it to the same value as the number of fields in the CSV header. Use it if the number of fields in the header is less than the number of data columns and you only want to parse the data columns which have the corresponding field in the header.
- Start row - if this value is not empty, the system will start reading the file from the specified 1-based row and will ignore previous rows.
- End row - if this value is not empty, the system will stop reading the file after the specified 1-based row.
- Transformation type - the default is
header. Read more about using the preprocessor to modify the contents of the source document. If
headeris selected and Filter or Preprocessor or Header is not empty, the header will be added at the beginning of the file, followed by an end-of-line character, followed by the actual data.
headeris selected for Transformation type, the specified header will be added at the beginning of the file, followed by an end-of-line character, followed by the actual data.
- All fields are strings - If this option is enabled (it is disabled by default) the system will create all fields with a string data type.
- Date and Time Format - a format for timestamps (date & time).
- Date Format - a format for date (date only, no time).
- Time Format - a format for time (time only, no date).
- Parse Dates - if this option is enabled, and the date or time value is not recognized as one of the formats defined above, Integrator will try to parse it using one of the well-known date & time formats.
- Trim Strings - if this option is enabled, Integrator will trim leading and trailing white-spaces from the value.
- Treat 'null' as null - if this option is enabled, Integrator will treat string values equal to 'null' as actual nulls (no value).
- Value for null - a string which will be used instead of a null value. A typical usage example is setting "Value for null" to "\N" so the Redshift COPY command can differentiate between an empty string and NULL value.
- Encode CLOB fields using Base64 - if this option is enabled (default), Integrator will encode fields with the CLOB data type using Base64 algorithm.
- - if this option is enabled (it is disabled by default), the system will strip not printable characters, such as null-character (/0), from the data row when creating a file.
- Maximum number of rows in file - the maximum number of rows in the file when creating new CSV files. Use it to split a large CSV document while creating it - it is extremely fast and efficient, compared to splitting the existing document.
- Encoding - character encoding when reading and writing CSV file.
No encodingmeans there will be no additional encoding.