Extract data from MongoDB
Extract data from a collection using CSV format
Data in MongoDB is stored using BSON format, which is a binary version of JSON. It allows storing nested documents as well as documents with a variable number of columns.
Etlworks can convert nested JSON documents into flat format where each nested column is a JSON-encoded string.
Use cases (CSV)
- Extract data from MongoDB, and load it into a relational database.
- Extract data from MongoDB and load it into the data warehouses such as Snowflake, Amazon Redshift, and Azure Synapse Analytics.
Process (CSV)
Step 1. Create a MongoDB Connection.
Step 2. Create a CSV Format. The default settings for the CSV Format will work.
Step 3. Create a Flow where the source is a file, and the destination is a database.
Step 4. Select the MongoDB Connection created in Step 1 and the CSV Format created in Step 2 as a source (FROM
).
Step 5. Enter or select the MongoDB collection name into the FROM
field.
Note: This flow supports extract by a wildcard, for example customer*
. When the FROM
is configured with a wildcard, you can use a wildcard in the TO
as well.
Step 6. Select a destination Connection and Format (if needed), and enter the destination into the TO
field.
Step 7. Continue by adding Mapping and transformations. Read how to use SQL with MongoDB.
Extract data from a collection using JSON format
If you prefer to keep the nested structure when extracting the data from the MongoDB document, use JSON format.
Use cases (JSON)
- Extract data from MongoDB, and create JSON files.
- Extract data from MongoDB and load it into the message queue, such as Kafka or Kinesis.
Process (JSON)
Step 1. Create a MongoDB Connection.
Step 2. Create a JSON Format. The default settings for the JSON Format will work.
Step 3. Create a Flow where the source is a file, and the destination is anything.
Step 4. Select the MongoDB Connection created in Step 1 and the JSON Format created in Step 2 as a source (FROM
).
Note: This flow supports extract by a wildcard, for example customer*
. When the FROM
is configured with a wildcard, you can use a wildcard in the TO
as well.
Step 5. Enter or select the MongoDB collection name into the FROM
field.
Step 6. Select a destination Connection and Format (if needed), and enter the destination into the TO
field.
Step 7. Continue by adding Mapping and transformations. Read how to use SQL with MongoDB.
Extract data from a collection using a filter
Using filters is deprecated. Read how to use SQL with MongoDB.
When extracting data from a collection using either CSV or JSON format, it is possible to set a filter. When the filter is set, the flow filters out and optionally transforms documents retrieved from a collection.
Process
Follow the same steps as described here (CSV) and here (JSON).
When setting the FROM in step 4, enter the filter in the following format: collection.{filter}
.
Example for the filter: {"STORE_ID":1}
- extract all documents where STORE_ID = 1.
Read about MongoDB query language.
Extract data from a specific document or a group of documents
Step 1. Create a MongoDB document Connection.
Step 2. Create a JSON Format. The default settings for the JSON Format will work.
Step 3. Create a Flow where the source is a file, and the destination is anything.
Step 4. Select the MongoDB document Connection created in Step 1 and the JSON Format created in Step 2 as a source (FROM
).
Step 5. Enter or select the source into the FROM
field.
Step 6. Select a destination Connection, Format (if needed), and enter the destination into the TO
field.
Step 7. Click Mapping and enter SQL in Source query. Read how to use SQL with MongoDB.
Capture changes in MongoDB documents and collections
The Etlworks Integrator MondoDB CDC connector can monitor a MongoDB replica set or a MongoDB shared cluster for document changes in databases and collections and send those changes as events to the destination.
Read how to set up a Flow to capture changes in MongoDB documents and collections.
Load data in MongoDB
Load data in MongoDB and create a separate document for each source record
Use it if you want to store each record of the source dataset as a separate MongoDB document.
Step 1. Create a MongoDB Connection.
When configuring the Connection, set the parameter What to do with the existing document when writing
to one of the following options:
-
replace
: replace the existing document with all the fields. -
update
: update only the fields in the existing document passed in the payload and leave other fields unchanged.
Step 2. Create a JSON Format. If the source document to store MongoDB is an ARRAY
(for example, a database table, CSV file, JSON array), then set the First Object Type
to variable
and the Root Object Name
to whatever name you want to use as a name for the root element. This field is required.
Step 3. Create a Flow where the destination is a file, and the source is anything.
Step 4. Select the source (FROM
) Connection and Format. Enter the source object name.
Step 5. Select the MongoDB Connection created in Step 1, and the JSON Format created in Step 2 as the destination (TO
).
Step 6. Enter or select the MongoDB collection name into the TO
field.
Optionally, you can specify the field which will be used to generate the unique value of the MongoDB object id (_id column
).
Format: collection_name.unique_field_name
Example: customers.CUSTOMER_ID
Step 7. Continue by adding Mapping and transformations.
Load data in MongoDB as a single document
Use it if you want to store the entire dataset (for example, the entire Excel worksheet) as a single MongoDB document.
Step 1. Create a MongoDB document Connection.
Step 2. Create a JSON Format. If the source document to store in MongoDB is an ARRAY
(for example, a database table, CSV file, JSON array, etc.), then set the First Object Type
to variable
and the Root Object Name
to whatever name you want to use as a name for the root element. This field is required.
Step 3. Create a Flow where the destination is a file, and the source is anything.
Step 4. Select the source (FROM
) Connection and Format. Enter the source object name.
Step 5. Select the MongoDB document Connection created in Step 1, and the JSON Format created in Step 2 as the destination (TO
).
Step 6. Enter or select the MongoDB object id into the TO
field.
Step 7. Continue by adding Mapping and transformations.
Comments
0 comments
Please sign in to leave a comment.