Get started with Databricks – Etlworks Support

Databricks is a unified data and AI platform built on the Delta Lake / Lakehouse architecture. Etlworks ships several flow types optimized for loading and reading Databricks at high performance.

Which Databricks flow should I use?

Flow	Use when
Any to Databricks (Database / File / Queue / Web service / Well-known API)	You need to extract from any source, optionally transform, and load into Databricks.
Bulk load files into Databricks	Files already exist in S3, ADLS Gen2, GCS, a Databricks Volume, or server storage. No transformation needed. Auto-generates COPY INTO; supports MERGE.
Stream CDC events into Databricks	You need real-time replication from a CDC-enabled source database (MySQL, SQL Server, PostgreSQL, Oracle, DB2, MongoDB, AS400).
Stream messages from a queue into Databricks	You need real-time ingestion from a message queue that supports streaming (Kafka, Kinesis, Pub/Sub, Service Bus, RabbitMQ, ActiveMQ, SQS).

What do I need before I start?

A Databricks workspace with a running compute resource (SQL warehouse, all-purpose cluster, or serverless SQL).
The compute resource's Server hostname and HTTP path — both visible on the compute resource's Connection details tab in the Databricks UI.
Authentication credentials — either a personal access token (PAT) or an OAuth service principal with client ID and OAuth secret.
Permissions: USE CATALOG / USE SCHEMA on the target Unity Catalog catalog and schema, plus SELECT, MODIFY, and CREATE TABLE on the target schema for flows that auto-create destination tables.
A stage for bulk operations: an S3 bucket, ADLS Gen2 container, GCS bucket, or a Databricks Volume that the workspace can read. For Unity Catalog, the stage typically lives behind an External Location with a storage credential.

Connect to Databricks

Open the Connections window and click +.
Type databricks in the search field.
Select the Databricks connection.
Pick the authentication method:
- Personal Access Token (default) — enter the host, HTTP path, and the token.
- OAuth Service Principal — enter the host, HTTP path, the service principal's client ID, and the OAuth secret. Recommended for production.
Optionally set a default Schema. To target a different catalog / schema per flow, use a fully qualified catalog.schema.table name in the destination.
For full connection-parameter reference, see configuring the Databricks connection.

Also create a connection for the stage. The supported stage types are:

Amazon S3 — for AWS workspaces.
Azure Storage (ADLS Gen2) — for Azure workspaces.
Google Cloud Storage — for GCP workspaces.
Server storage — for files already on the Etlworks instance.
Databricks Volume — configure a server-storage connection pointed at the volume path (/Volumes/catalog/schema/volume/…).

Where to go next

Topic	Article
Extract, transform, and load data into Databricks	Extract, transform, and load data in Databricks
Bulk-load existing files	Bulk load files into Databricks
ELT — run transformation SQL directly in Databricks	ELT with Databricks
Reverse ETL — extract from Databricks into any destination	Reverse ETL with Databricks
Data type mapping (JDBC ↔ Databricks)	Data type mapping for Databricks
Stream CDC events into Databricks	Create pipeline to CDC data into Databricks
Stream from message queues	Streaming with message queues

Articles in this section

Which Databricks flow should I use?

What do I need before I start?

Connect to Databricks

Where to go next

Related articles