Overview
Etlworks can automatically create — and drop — staging tables from almost any data source (files, API endpoints, NoSQL stores, …). See also create staging tables or flat files for each dimension of a nested dataset.
How do I set this up?
- Open the Flows window, click +, and type create staging tables in the Select Flow Type box.
- Define the FROM (any data source) and TO (a database connection). The TO field becomes the name of the staging table.
- Optionally, configure the flow to drop the existing staging table if it already exists.
Use a temporary database as a staging area
Etlworks supports an in-memory SQLite-based temporary database that you can use as a staging area. The data exists only for the duration of the flow.
Note: You do not need to clean up or drop tables in the temp database. They are dropped automatically when the flow finishes and the connection closes.
Store the source document in the staging database and transform using SQL
Useful when you want to land a file or a web-service response in a temporary database and then use SQL to transform it.
- Create a connection to a temporary database.
- Create a flow whose source is the file or web service and whose destination is the connection from step 1. Let the system create the staging table automatically.
- Combine the flow from step 2 with other flows that transform the data.
Combine multiple source documents into a single output
- Create a connection to a temporary database.
- Create a flow that reads multiple source files and inserts the data into the staging database using the connection from step 1.
- Create a flow that reads from the staging database and writes the combined output file.
- Combine the two flows into a nested flow.
Use a temporary database in a database loop
See running a flow once per database record.
- Create a connection to a temporary database.
- Create a flow that loads data into the staging database (for example, read multiple source files and insert into the staging database).
- Create a database loop using the connection from step 1.
Switching between in-memory temp DB and a persistent SQLite staging DB
A common workflow during development: use a SQLite connection as the staging database so you can inspect the data after the flow runs. When the flow is ready for production, switch the same connection to in-memory.
- Use a SQLite connection wherever you'd normally use a temp DB connection. Tip: give the connection and the database file unique names so they don't collide with other flows or developers.
- If the SQLite connection is used as a destination, optionally add a step that deletes the database file before the flow starts — this simulates the in-memory behavior. If the database file is {app.data}/test-staging.dat, create a Delete files flow with a Server Storage connection pointing at app.data and test-staging.dat as the FROM filename.
- When the flow is ready for production, switch the SQLite connection to in-memory: set the URL to jdbc:sqlite: and leave the database name empty.
Note: You can always switch back to a persistent staging database by editing the SQLite connection again.