Etlworks reads from MongoDB collections, writes into them, and captures document-level changes via CDC. This article covers every supported pattern with step-by-step setup.
MongoDB stores data in BSON — a binary form of JSON that supports nested documents, arrays, and variable schemas. Etlworks can either flatten MongoDB documents into a tabular shape or preserve their nested structure, depending on where the data is going.
Extract data from a MongoDB collection
Two paths: CSV when the destination is relational and you want flat data, JSON when you need to preserve the nested document structure.
Path 1: extract as CSV (flat / tabular)
Use CSV when the destination is a relational database or a data warehouse (Snowflake, Redshift, Synapse, …). Nested fields are emitted as JSON-encoded strings so they can be loaded into a single column.
- Create a MongoDB connection.
- Create a CSV format. Defaults are fine for most cases.
- Create a flow where the source is NoSQL. Type nosql to in Select Flow Type and pick the flow that matches your destination.
- Set FROM to the MongoDB connection + CSV format. Enter the MongoDB collection name; wildcards are supported (e.g. customer*). If FROM uses a wildcard, TO can use one too.
- Set TO to the destination connection + format (if needed) and the destination object name.
- Add mapping or transformations as needed. For filtering and shaping, use SQL with MongoDB.
Path 2: extract as JSON (preserve nesting)
Use JSON when the destination should keep the nested document structure — for example, exporting as JSON files, or streaming into Kafka / Kinesis.
- Create a MongoDB connection.
- Create a JSON format. Defaults are fine.
- Create a flow where the source is NoSQL. Type nosql to and pick the flow type that matches your destination.
- Set FROM to the MongoDB connection + JSON format. Enter the MongoDB collection name; wildcards are supported.
- Set TO to the destination connection + format and object name.
- Add mapping or transformations. For filtering and shaping, use SQL with MongoDB.
Filtering during extraction
Use SQL with MongoDB for filtering and shaping — better readability and long-term support than the older MongoDB-query-in-collection-name approach.
Deprecated: the legacy collection.{filter} syntax in the FROM field (e.g. customers.{"STORE_ID":1}) still works for backward compatibility but is not recommended for new flows. See the MongoDB query language reference.
Extract specific documents (MongoDB Document connection)
When you want one document (or a small group) instead of a full collection scan, use a MongoDB Document connection.
- Create the MongoDB Document connection.
- Create a JSON format.
- Create a flow where the source is NoSQL. Type nosql to.
- Set FROM to the MongoDB Document connection + JSON format. Enter the source identifier.
- Set TO to the destination connection + format and object name.
- In Mapping, enter SQL in Source query to narrow down what's read. See Using SQL with MongoDB.
Capture changes in MongoDB (CDC)
Etlworks ships a native MongoDB CDC connector that monitors replica sets and sharded clusters. The connector captures document-level changes (insert / update / delete) and emits them as events. Use it for real-time pipelines, event streaming, and incremental replication.
Reference:
Load data into MongoDB
Two patterns — pick based on whether each source row becomes its own document, or the whole dataset goes in as one document.
Pattern 1: one source row = one MongoDB document
Use this for tabular sources (database tables, CSV files, JSON arrays).
- Create a MongoDB connection. Set What to do with the existing document when writing:
- insert — inserts new documents without checking for duplicates. Fastest. Use for initial loads only.
- update — updates only the fields present in the payload; other fields are left unchanged.
- replace — replaces the whole document. Slowest, but guarantees full replacement.
- Create a JSON format. If the source is array-shaped (table, CSV, JSON array):
- Set First Object Type to variable.
- Set Root Object Name to any valid name (required).
- Create a flow where the destination is NoSQL. Type to nosql and pick the flow type that matches your source.
- Set FROM to the source connection + format and the source object name.
- Set TO to the MongoDB connection + JSON format.
- Enter the MongoDB collection name in TO. Optionally specify a unique field for the MongoDB _id using collection_name.unique_field_name — e.g. customers.CUSTOMER_ID.
- Add mapping or transformations as needed.
Pattern 2: whole dataset = one MongoDB document
Use this for storing an Excel worksheet, a full JSON array, or a configuration payload as a single document.
- Create a MongoDB Document connection.
- Create a JSON format. If the source is an array, set First Object Type = variable and pick a meaningful Root Object Name.
- Create a flow where the destination is NoSQL. Type to nosql.
- Set FROM to the source connection + format and the source object name.
- Set TO to the MongoDB Document connection + JSON format.
- Enter the MongoDB document _id in TO.
- Add mapping or transformations as needed.
Improve load performance
Three settings dominate write throughput when loading into MongoDB:
| Parameter | What it does |
|---|---|
| Batch Size | Documents per write round-trip. Higher batch sizes mean fewer network calls and faster loads, especially for large datasets. |
| Write Concern | The acknowledgment level MongoDB requires for each write. Lower acknowledgment = faster writes but weaker durability. Use minimal acknowledgment for initial loads where you can re-run on failure. |
| What to do with the existing document when writing | insert is fastest (no lookup); use it for the initial load. Switch to update or replace for subsequent runs to handle duplicates correctly. replace is slowest because the whole document is rewritten every time. |
Convert timestamp strings to ISODate
MongoDB queries and indexes work better against native ISODate values than against timestamp strings. The MongoDB connector can convert matching strings to ISODate on write — off by default.
- Enable Timestamp Conversion on the MongoDB connection.
- Set a Regex Pattern that identifies timestamp strings. If you don't provide one, the connector uses the default ISO 8601 pattern: ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d{3})?(Z|[+-]\d{2}:\d{2})$.
- Any string matching the pattern is converted to ISODate on write.