Extract data from MongoDB
Extract data from a collection using CSV format
Data in MongoDB is stored using BSON format, which is a binary version of JSON. It allows storing the nested documents as well as documents with a variable number of columns.
Etlworks can convert nested JSON documents into flat format where each nested column is a JSON-encoded string.
Use cases (CSV)
- Extract data from MongoDB, and load it into a relational database.
- Extract data from MongoDB and load it into the data warehouses such as Snowflake, Amazon Redshift, and Azure Synapse Analytics.
Process (CSV)
Step 1. Create a Streaming MongoDB Connection.
Step 2. Create a CSV Format. The default settings for the CSV Format will work.
Step 3. Create a Flow where the source is a file, and the destination is a database.
Step 4. Select the Streaming MongoDB Connection created in Step 1 and the CSV Format created in Step 2 as a source (FROM
).
Step 5. Enter or select the MongoDB collection name into the FROM
field.
Step 6. Select a destination Connection and Format (if needed), and enter the destination into the TO
field.
Step 7. Continue by adding Mapping and transformations.
Extract data from a collection using JSON format
If you prefer to keep the nested structure when extracting the data from the MongoDB document use JSON format.
Use cases (JSON)
- Extract data from MongoDB, and create JSON files.
- Extract data from MongoDB and load it into the message queue, such as Kafka or Kinesis.
Process (JSON)
Step 1. Create a Streaming MongoDB Connection.
Step 2. Create a JSON Format. The default settings for the JSON Format will work.
Step 3. Create a Flow where the source is a file, and the destination is anything.
Step 4. Select the Streaming MongoDB Connection created in Step 1 and the JSON Format created in Step 2 as a source (FROM
).
Step 5. Enter or select the MongoDB collection name into the FROM
field.
Step 6. Select a destination Connection and Format (if needed), and enter the destination into the TO
field.
Step 7. Continue by adding Mapping and transformations.
Extract data from a specific document or a group of documents
Step 1. Create a MongoDB Connection.
Step 2. Create a JSON Format. The default settings for the JSON Format will work.
Step 3. Create a Flow where the source is a file, and the destination is anything.
Step 4. Select the MongoDB Connection created in Step 1 and the JSON Format created in Step 2 as a source (FROM
).
Step 5. Enter or select the source into the FROM
field. The following options are available:
- A fully qualified object id: the object id as it is stored in MongoDB.
- A wildcard object id (for example,
sales_orders*
): the system will read all the documents that match the wildcard and combine them into a JSON array, where each document is one row in the array. - A request to MongoDB in the JSON Format: the system will read all the documents that match the request and combine them into a JSON array, where each document is one row in the array. Here's an example of a request:
{"first":"Simba"}
.
Step 6. Select a destination Connection, Format (if needed), and enter the destination into the TO
field.
Step 7. Continue by adding Mapping and transformations.
Capture changes in MongoDB documents and collections
The Etlworks Integrator MondoDB CDC connector can monitor a MongoDB replica set or a MongoDB shared cluster for document changes in databases and collections and send those changes as events to the destination.
Read how to set up a Flow to capture changes in MongoDB documents and collections.
Load data in MongoDB
Load data in MongoDB and create a separate document for each source record
Use it if you want to store each record of the source dataset as a separate MongoDB document.
Step 1. Create a Streaming MongoDB Connection.
When configuring the Connection, set the parameter What to do with the existing document when writing
to one of the following options:
replace
: replace the existing document with all the fields.update
: update only the fields in the existing document passed in the payload and leave other fields unchanged.
Step 2. Create a JSON Format. If the source document to store MongoDB is an ARRAY
(for example, a database table, CSV file, JSON array), then set the First Object Type
to variable
and the Root Object Name
to whatever name you want to use as a name for the root element. This field is required.
Step 3. Create a Flow where the destination is a file, and the source is anything.
Step 4. Select the source (FROM
) Connection and Format. Enter the source object name.
Step 5. Select the MongoDB Connection created in Step 1, and the JSON Format created in Step 2 as the destination (TO
).
Step 6. Enter or select the MongoDB collection name into the TO
field.
Optionally, you can specify the field which will be used to generate the unique value of the MongoDB object id (_id column
).
Format: collection_name.unique_field_name
Example: customers.CUSTOMER_ID
Step 7. Continue by adding Mapping and transformations.
Load data in MongoDB as a single document
Use it if you want to store the entire dataset (for example, the entire Excel worksheet) as a single MongoDB document.
Step 1. Create a MongoDB Connection.
Step 2. Create a JSON Format. If the source document to store in MongoDB is an ARRAY
(for example, a database table, CSV file, JSON array, etc.), then set the First Object Type
to variable
and the Root Object Name
to whatever name you want to use as a name for the root element. This field is required.
Step 3. Create a Flow where the destination is a file, and the source is anything.
Step 4. Select the source (FROM
) Connection and Format. Enter the source object name.
Step 5. Select the MongoDB Connection created in Step 1, and the JSON Format created in Step 2 as the destination (TO
).
Step 6. Enter or select the MongoDB object id into the TO
field.
Step 7. Continue by adding Mapping and transformations.
Comments
0 comments
Please sign in to leave a comment.