DuckLake Data Connector
DuckLake is an open lakehouse format that stores metadata in a SQLite-compatible database (or PostgreSQL) and data in Parquet files. This connector enables querying individual DuckLake tables as datasets in Spice.
For automatic discovery of all schemas and tables in a DuckLake catalog, use the DuckLake Catalog Connector instead.
datasets:
- from: ducklake:my_table
name: my_table
params:
ducklake_connection_string: s3://my-bucket/path/metadata.ducklake
Configuration​
from​
The from field specifies the DuckLake table to connect to. Use ducklake:<table_path>, where table_path is the table name or a schema-qualified table name.
from | Description |
|---|---|
ducklake:my_table | Read from my_table in the default main schema |
ducklake:my_schema.my_table | Read from my_table in the my_schema schema |
name​
The dataset name. This will be used as the table name within Spice.
datasets:
- from: ducklake:customer
name: tpch_customer
params:
ducklake_connection_string: s3://my-bucket/metadata.ducklake
SELECT COUNT(*) FROM tpch_customer;
The dataset name cannot be a reserved keyword.
params​
| Parameter Name | Description |
|---|---|
ducklake_connection_string | Required. The DuckLake metadata location (e.g., s3://bucket/path/metadata.ducklake). |
ducklake_name | The name to attach the DuckLake catalog as in DuckDB. Default: ducklake. |
ducklake_open | Path to an existing DuckDB file for persistent storage. If not provided, an in-memory DuckDB instance is used. |
ducklake_aws_region | Optional. The AWS region for S3 storage. Default: us-east-1 when explicit credentials are provided. |
ducklake_aws_access_key_id | Optional. The AWS access key ID for S3 storage. Must be set together with ducklake_aws_secret_access_key. |
ducklake_aws_secret_access_key | Optional. The AWS secret access key for S3 storage. Must be set together with ducklake_aws_access_key_id. |
ducklake_aws_endpoint | Optional. Custom S3-compatible endpoint URL (e.g., for MinIO). |
ducklake_aws_allow_http | Optional. Set to true to allow HTTP (non-TLS) connections to S3. Default: false. |
Connection string formats​
| Backend | Example |
|---|---|
| Local file | /path/to/metadata.ducklake |
| AWS S3 | s3://bucket/path/metadata.ducklake |
| PostgreSQL | postgres:dbname=mydb host=localhost user=postgres password=secret |
Authentication​
AWS S3​
When no explicit S3 credentials are configured, DuckDB falls back to its built-in credential chain provider:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN) - Shared credentials file (
~/.aws/credentials) - IAM instance profiles (on EC2/ECS)
To provide explicit S3 credentials, use the ducklake_aws_* parameters:
datasets:
- from: ducklake:customer
name: customer
params:
ducklake_connection_string: s3://my-bucket/metadata.ducklake
ducklake_aws_region: us-west-2
ducklake_aws_access_key_id: ${secrets:AWS_ACCESS_KEY_ID}
ducklake_aws_secret_access_key: ${secrets:AWS_SECRET_ACCESS_KEY}
For S3-compatible storage (e.g., MinIO), use ducklake_aws_endpoint:
datasets:
- from: ducklake:customer
name: customer
params:
ducklake_connection_string: s3://my-bucket/metadata.ducklake
ducklake_aws_endpoint: http://minio:9000
ducklake_aws_access_key_id: ${secrets:MINIO_ACCESS_KEY}
ducklake_aws_secret_access_key: ${secrets:MINIO_SECRET_KEY}
ducklake_aws_allow_http: true
Examples​
Reading from a local DuckLake catalog​
datasets:
- from: ducklake:customer
name: customer
params:
ducklake_connection_string: /path/to/metadata.ducklake
Reading from S3​
datasets:
- from: ducklake:customer
name: customer
params:
ducklake_connection_string: s3://my-bucket/lakehouse/metadata.ducklake
Reading from a specific schema​
datasets:
- from: ducklake:analytics.events
name: events
params:
ducklake_connection_string: s3://my-bucket/metadata.ducklake
PostgreSQL metadata backend​
datasets:
- from: ducklake:customer
name: customer
params:
ducklake_connection_string: "postgres:dbname=ducklake_catalog host=localhost user=postgres password=postgres"
Multiple tables with YAML anchors​
datasets:
- from: ducklake:customer
name: customer
params: &ducklake_params
ducklake_connection_string: s3://my-bucket/metadata.ducklake
- from: ducklake:orders
name: orders
params: *ducklake_params
- from: ducklake:lineitem
name: lineitem
params: *ducklake_params
With data acceleration​
datasets:
- from: ducklake:customer
name: customer
params:
ducklake_connection_string: s3://my-bucket/metadata.ducklake
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_interval: 1h
- Spice uses DuckDB 1.4.4, which supports DuckLake format versions 0.1, 0.2, and 0.3 only. Catalogs created with DuckDB 1.5.x or later use format v0.4+, which is not currently supported.
- The DuckLake DuckDB extension is downloaded at runtime on first use, requiring network connectivity.
- The
ducklake_connection_stringparameter is required — unlike the catalog connector, it cannot be omitted. - Each dataset creates its own DuckDB connection pool. For querying many tables from the same catalog, consider using the DuckLake Catalog Connector instead, which shares a single connection pool.
