Consider setting the Fetch Size when extracting data from large datasets
Many database drivers (MySQL, Redshift, and others), don't set the upper limit on the number of records which can be retrieved in one database call. When extracting data from larger datasets, it could cause OutOfMemory error and prevent the system from extracting and loading the data.
The Fetch Size field can be used to set the maximum number of records which will be retrieved in one database call, thus limiting the memory consumption within the JVM. You can set the Fetch Size at the connection level, or override it at the transformation level.
Consider setting the Max Field Size when extracting data from wide tables
This parameter sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet object produced by this Statement object. This limit applies only to BINARY, VARBINARY, LONGVARBINARY, CHAR, VARCHAR, NCHAR, NVARCHAR, LONGNVARCHAR and LONGVARCHAR fields. If the limit is exceeded, the excess data is silently discarded. For maximum portability, use values greater than 256.
You can set the Max Field Size at the connection level, or override it at the transformation level.
Use streaming and bind variables if possible, but avoid using native SQL
Enable Force streaming when streaming is not supported
Read about force streaming.
Enable Auto Commit
If you are loading large datasets into the database, enable Auto Commit for the destination database connection.
Create indexes
If you are running custom SQL queries or using MERGE/UPDATE/DELETE/IfExist SQL actions, make sure that all relevant database indexes exist.
Use flows optimized for Snowflake and Redshift to load data into these online data warehouses
Snowflake and Redshift are columnar databases (optimized for data retrieval operations), so loading data into them using DML statements (INSERT/UPDATE/DELETE) can be quite slow, specifically for larger datasets.
While it is possible to load data in Snowflake and Redshift using regular flows, such as database-to-database, it is, however, highly recommend to use flows optimized for these databases.
Read how to efficiently load large datasets into Snowflake with Etlworks.
Read how to efficiently load large datasets into Redshift with Etlworks.
Comments
0 comments
Please sign in to leave a comment.