Consider setting the Fetch Size when extracting data from large datasets
Many database drivers (MySQL, Redshift, and others), don't set the upper limit on the number of records that can be retrieved in one database call. When extracting data from larger datasets, it could cause an OutOfMemory error and prevent the system from extracting and loading the data.
The Fetch Size
field can be used to set the maximum number of records that will be retrieved in one database call, thus limiting the memory consumption within the JVM. You can set the Fetch Size
at the Connection level, or override it at the transformation level.
Consider setting the Max Field Size when extracting data from wide tables
This parameter sets the limit for the maximum number of bytes that can be returned for character and binary column values in a ResultSet
object produced by this Statement object. This limit applies only to BINARY
, VARBINARY
, LONGVARBINARY
, CHAR
, VARCHAR
, NCHAR
, NVARCHAR
, LONGNVARCHAR
, and LONGVARCHAR
fields. If the limit is exceeded, the excess data is silently discarded. For maximum portability, use values greater than 256.
You can set the Max Field Size
at the Connection level, or override it at the transformation level.
Use streaming and bind variables if possible, but avoid using native SQL
Enable Force streaming when streaming is not supported
Read about force streaming.
Enable Auto Commit
If you are loading large datasets into the database, enable Auto Commit
for the destination database Connection.
Create indexes
If you are running custom SQL queries or using MERGE
/ UPDATE
/ DELETE
/ IfExist
SQL actions, make sure that all relevant database indexes exist.
Use Flows optimized for Snowflake and Redshift to load data into these online data warehouses
Snowflake and Redshift are columnar databases (optimized for data retrieval operations), so loading data into them using DML statements (INSERT
/ UPDATE
/ DELETE
) can be quite slow, specifically for larger datasets.
While it is possible to load data in Snowflake and Redshift using regular Flows, such as database-to-database, it is, however, highly recommend to use Flows optimized for these databases.
Read how to efficiently load large datasets into Snowflake with the Etlworks Integrator.
Read how to efficiently load large datasets into Redshift with the Etlworks Integrator.
Comments
0 comments
Please sign in to leave a comment.