When to use this Flow type
This Flow loads CSV or JSON files directly into Snowflake. The files can be anywhere, so long as the Etlworks Integrator can read them: server storage, FTP, FTPS, SFTP, cloud storage, etc.
This is just a bulk load. Files must already exist.
Create Flow
Step 1. Create a source Connection that can either be file storage or cloud storage.
Step 2. Create and test the Snowflake Connection as explained here.
Step 3. Create a Connection to the Amazon S3 bucket that will be used as a staging area, as explained here. When configuring the S3 Connection, select GZip
as the value for the Archive file before copying to S3
field.
Step 4. Start creating the Snowflake Flow as explained here.
Step 5. Select Load files into Snowflake
.
Step 6. Continue by defining the source and destination.
Step 7. For the source FROM
, select the Connection created in Step 1 and enter a file name or a wildcard file name, for example, *.CSV
.
Step 8. For the destination TO
, select the Connection created in Step 3.
Step 9. Select the Snowflake Connection created in Step 2, for the final destination, as explained here.
Step 10. Continue by defining the parameters. The only required parameter is Copy INTO SQL
:
Step 11. Define the other parameters (the rest are optional):
Action
: ifMove
is selected (default), the file(s) will be copied to the S3 bucket and removed from the original source. IfCopy
is selected, the file(s) will be copied to the S3 bucket and retain in the original source.Add Suffix to the Destination File Name
: you can select one of the predefined suffixes for the files created using this file operation. For example, if you selectuuid
as a suffix, and the original filename isdest.csv
, the Etlworks Integrator will create files with the namedest_uuid.csv
, whereuuid
is a globally unique identifier such as21EC2020-3AEA-4069-A2DD-08002B30309D
. The default value for this field isuuid
.Do not process files that have been already processed
: if this option is enabled, the system will skip files that have already been loaded into Snowflake.
Snowflake also keeps track of which files are loaded, but when this option is enabled, the files will not even be copied to the S3, saving a lot of money on traffic.
Maximum Simultaneous Operations
: the Eltworks Integrator can copy or move each file in its own thread. Use this property to set the maximum number of simultaneous file operations.Purge File(s) if Error
: if this option is enabled (by default), the staging files will be automatically deleted if theCOPY INTO
command was executed with an error.Name(s) of the File(s) to Purge
: ifPurge File(s) if Error
is enabled, this makes it possible to configure which files you want the system to delete if an error occurs while executing theCOPY INTO
command. Wildcard file names are allowed. If not specified, the system will use the same wildcard file name as it used for theCOPY INTO
command.Before COPY INTO SQL
: this SQL will be executed on the Snowflake ConnectionBefore COPY INTO SQL
.Ignore errors when executing Before COPY INTO SQL
: if this option is enabled, and there is an error whenBefore COPY INTO SQL
is executed, the error will be ignored.After COPY INTO SQL
: this SQL will be executed on the Snowflake ConnectionAfter COPY INTO SQL
.Ignore errors when executing After COPY INTO SQL
: if this option is enabled, and there is an error whenAfter COPY INTO SQL
is executed, the error will be ignored.
Step 12. Save and run the Flow.
Comments
0 comments
Please sign in to leave a comment.