Skip to main content

MongoDB Data Connector

MongoDB is an open-source NoSQL database that stores data in flexible, JSON-like documents, allowing for dynamic schemas and easy scalability.

The MongoDB Data Connector enables federated/accelerated SQL queries on data stored in MongoDB databases.

datasets:
- from: mongodb:mytable
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_pool_min: 0
mongodb_pool_max: 10

Configuration​

from​

The from field takes the form mongodb:{table_name} where table_name is the table identifer in the MongoDB server to read from.

datasets:
- from: mongodb:mytable
name: my_dataset
params:
mongodb_db: my_database
...

name​

The dataset name. This will be used as the table name within Spice.

Example:

datasets:
- from: mongodb:my_dataset
name: cool_dataset
params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+

The dataset name cannot be a reserved keyword

params​

The MongoDB data connector can be configured by providing the following params. Use the secret replacement syntax to load the secret from a secret store, e.g. ${secrets:my_mongodb_conn_string}.

Parameter NameDescription
mongodb_connection_stringThe connection string to use to connect to the MongoDB server. This can be used instead of providing individual connection parameters.
mongodb_userThe MongoDB username.
mongodb_passThe password to connect with.
mongodb_hostThe hostname of the MongoDB server.
mongodb_portThe port of the MongoDB server.
mongodb_dbThe name of the database to connect to.
mongodb_sslmodeOptional. Specifies the SSL/TLS behavior for the connection, supported values:
  • required: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.
  • preferred: This mode will try to establish a secure SSL connection if possible, but will connect insecurely if the server does not support SSL.
  • disabled: This mode will not attempt to use an SSL connection, even if the server supports it.
mongodb_sslrootcertOptional parameter specifying the path to a custom PEM certificate that the connector will trust.
mongodb_time_zoneOptional. Specifies connection time zone. Default is UTC. Accepts:
  • Fixed offsets (e.g., +02:00).
  • IANA time zone names (e.g., America/Los_Angeles)
mongodb_auth_sourceOptional. Authentication source database. Overrides the default auth source in the connection string.
mongodb_unnest_depthOptional. Maximum nesting depth for unnesting embedded documents into a flattened structure. Higher values expand deeper nested fields. Default: 0
mongodb_num_docs_to_infer_schemaOptional. Number of documents to use to infer the schema. Defaults to 400.
mongodb_pool_minThe minimum number of connections to keep open in the pool, lazily created when requested. Default: 0
mongodb_pool_maxThe maximum number of connections to allow in the pool. Default: 10

Types​

The table below shows the MongoDB data types supported, along with the type mapping to Apache Arrow types in Spice.

MongoDB TypeArrow Type
StringUtf8
BooleanBoolean
Int32Int32
Int64Int64
DoubleFloat64
Decimal128Decimal128
BinaryBinary
Datetime without timeDate32
Datetime with timeTimestamp(Millisecond, <Timezone>)
TimestampTimestamp(Millisecond, None)
ArrayList<Utf8>
NullNull
UndefinedNull
RegularExpressionUtf8
JavaScriptCodeUtf8
JavaScriptCodeWithScopeUtf8
SymbolUtf8
MaxKeyUtf8
MinKeyUtf8
DbPointerUtf8
ObjectIdUtf8
DocumentSee unnesting section
note
  • The MongoDB Datetime value is retrieved as a UTC time value by default. Use the mongodb_time_zone configuration parameter to specify the desired time zone for interpreting TIMESTAMP values during data retrieval.

Unnesting​

Consider the following document:

{
"a": 1,
"b": {
"x": 2,
"y": {
"z": 3
}
}
}

Using mongodb_unnest_depth you can control the unnesting behavior. Here are the examples:

mongodb_unnest_depth: 0​

sql> select * from test_table;
+-----------+---------------------+
| a (Int32) | b (Utf8) |
+-----------+---------------------+
| 1 | {"x":2,"y":{"z":3}} |
+---+-----------------------------+

mongodb_unnest_depth: 1​

sql> select * from test_table;
+-----------+-------------+------------+
| a (Int32) | b.x (Int32) | b.y (Utf8) |
+-----------+-------------+------------+
| 1 | 2 | {"z":3} |
+-----------+-------------+------------+

mongodb_unnest_depth: 2​

sql> select * from test_table;
+-----------+-------------+---------------+
| a (Int32) | b.x (Int32) | b.y.z (Int32) |
+-----------+-------------+---------------+
| 1 | 2 | 3 |
+-----------+-------------+---------------+

Examples​

Connecting using username and password and custom auth table​

datasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_auth_source: admin

Connecting using SSL​

datasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_sslmode: preferred
mongodb_sslrootcert: ./custom_cert.pem

Connecting using a Connection String​

datasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_connection_string: mongodb://${secrets:my_user}:${secrets:my_password}@localhost:27017/my_db?authSource=admin

With custom connection pool settings​

datasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_pool_min: 5
mongodb_pool_max: 10

Secrets​

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation. Additionally, learn how to use referenced secrets in component parameters by visiting the using referenced secrets guide.

Cookbook​