Google BigQuery is a serverless, cost-effective cloud data warehouse. Etlworks ships several flow types optimized for loading and reading BigQuery.
Which BigQuery flow should I use?
| Flow | Use when |
|---|---|
| Any to BigQuery (Database / File / Queue / Web service / Well-known API) | You need to extract from any source, optionally transform, and load into BigQuery. |
| Bulk load files in Google Cloud Storage into BigQuery | The files already exist in GCS. No transformation needed. Auto-generates the load command; supports MERGE. |
| Stream CDC events into BigQuery | You need real-time replication from a CDC-enabled source database. |
| Streaming with message queues | You need real-time ingestion from a message queue that supports streaming. |
What do I need before I start?
- A BigQuery dataset reachable from your Etlworks instance.
- A Google service account with permissions to read/write the target dataset and to read/write the GCS bucket used as the stage.
- A Google Cloud Storage bucket for staging.
Connect to BigQuery
- Open the Connections window and click +.
- Type bigquery in the search field.
- Select the Google BigQuery connection and fill in the connection parameters. Full reference: configuring the BigQuery connection.
Also create a Google Cloud Storage connection to use as the BigQuery stage.
Where to go next
| Topic | Article |
|---|---|
| Extract, transform, and load data into BigQuery | Extract, transform, and load data in Google BigQuery |
| Bulk-load existing GCS files | Bulk load files in Google Cloud Storage into BigQuery |
| ELT — run transformation SQL directly in BigQuery | ELT with Google BigQuery |
| Reverse ETL — extract from BigQuery into any destination | Reverse ETL with Google BigQuery |
| Load many tables at once | Load multiple tables with a wildcard |
| Incremental load (HWM) | Incremental change replication using high watermark |
| Troubleshooting | Common issues when loading data into cloud data warehouses |