Version: Next

DuckLake Data Connector

DuckLake is an open lakehouse format that stores metadata in a SQLite-compatible database (or PostgreSQL) and data in Parquet files. This connector enables querying individual DuckLake tables as datasets in Spice.

For automatic discovery of all schemas and tables in a DuckLake catalog, use the DuckLake Catalog Connector instead.

datasets:
  - from: ducklake:my_table
    name: my_table
    params:
      ducklake_connection_string: s3://my-bucket/path/metadata.ducklake

Configuration

`from`

The from field specifies the DuckLake table to connect to. Use ducklake:<table_path>, where table_path is the table name or a schema-qualified table name.

`from`	Description
`ducklake:my_table`	Read from `my_table` in the default `main` schema
`ducklake:my_schema.my_table`	Read from `my_table` in the `my_schema` schema

`name`

The dataset name. This will be used as the table name within Spice.

datasets:
  - from: ducklake:customer
    name: tpch_customer
    params:
      ducklake_connection_string: s3://my-bucket/metadata.ducklake

SELECT COUNT(*) FROM tpch_customer;

The dataset name cannot be a reserved keyword.

`params`

Parameter Name	Description
`ducklake_connection_string`	Required. The DuckLake metadata location (e.g., `s3://bucket/path/metadata.ducklake`).
`ducklake_name`	The name to attach the DuckLake catalog as in DuckDB. Default: `ducklake`.
`ducklake_open`	Path to an existing DuckDB file for persistent storage. If not provided, an in-memory DuckDB instance is used.
`ducklake_aws_region`	Optional. The AWS region for S3 storage. Default: `us-east-1` when explicit credentials are provided.
`ducklake_aws_access_key_id`	Optional. The AWS access key ID for S3 storage. Must be set together with `ducklake_aws_secret_access_key`.
`ducklake_aws_secret_access_key`	Optional. The AWS secret access key for S3 storage. Must be set together with `ducklake_aws_access_key_id`.
`ducklake_aws_endpoint`	Optional. Custom S3-compatible endpoint URL (e.g., for MinIO).
`ducklake_aws_allow_http`	Optional. Set to `true` to allow HTTP (non-TLS) connections to S3. Default: `false`.

Connection string formats

Backend	Example
Local file	`/path/to/metadata.ducklake`
AWS S3	`s3://bucket/path/metadata.ducklake`
PostgreSQL	`postgres:dbname=mydb host=localhost user=postgres password=secret`

Authentication

AWS S3

When no explicit S3 credentials are configured, DuckDB falls back to its built-in credential chain provider:

Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN)
Shared credentials file (~/.aws/credentials)
IAM instance profiles (on EC2/ECS)

To provide explicit S3 credentials, use the ducklake_aws_* parameters:

datasets:
  - from: ducklake:customer
    name: customer
    params:
      ducklake_connection_string: s3://my-bucket/metadata.ducklake
      ducklake_aws_region: us-west-2
      ducklake_aws_access_key_id: ${secrets:AWS_ACCESS_KEY_ID}
      ducklake_aws_secret_access_key: ${secrets:AWS_SECRET_ACCESS_KEY}

For S3-compatible storage (e.g., MinIO), use ducklake_aws_endpoint:

datasets:
  - from: ducklake:customer
    name: customer
    params:
      ducklake_connection_string: s3://my-bucket/metadata.ducklake
      ducklake_aws_endpoint: http://minio:9000
      ducklake_aws_access_key_id: ${secrets:MINIO_ACCESS_KEY}
      ducklake_aws_secret_access_key: ${secrets:MINIO_SECRET_KEY}
      ducklake_aws_allow_http: true

Write Support

This connector supports writing data to DuckLake tables using SQL INSERT INTO statements when access is set to read_write:

datasets:
  - from: ducklake:customer
    name: customer
    access: read_write
    params:
      ducklake_connection_string: s3://my-bucket/metadata.ducklake

INSERT INTO customer (c_custkey, c_name) VALUES (1, 'Acme Corp');

UPDATE and DELETE FROM are not supported. For DDL operations (CREATE TABLE, DROP TABLE), use the DuckLake Catalog Connector with access: read_write_create.

Examples

Reading from a local DuckLake catalog

datasets:
  - from: ducklake:customer
    name: customer
    params:
      ducklake_connection_string: /path/to/metadata.ducklake

Reading from S3

datasets:
  - from: ducklake:customer
    name: customer
    params:
      ducklake_connection_string: s3://my-bucket/lakehouse/metadata.ducklake

Reading from a specific schema

datasets:
  - from: ducklake:analytics.events
    name: events
    params:
      ducklake_connection_string: s3://my-bucket/metadata.ducklake

PostgreSQL metadata backend

datasets:
  - from: ducklake:customer
    name: customer
    params:
      ducklake_connection_string: "postgres:dbname=ducklake_catalog host=localhost user=postgres password=postgres"

Multiple tables with YAML anchors

datasets:
  - from: ducklake:customer
    name: customer
    params: &ducklake_params
      ducklake_connection_string: s3://my-bucket/metadata.ducklake
  - from: ducklake:orders
    name: orders
    params: *ducklake_params
  - from: ducklake:lineitem
    name: lineitem
    params: *ducklake_params

With data acceleration

datasets:
  - from: ducklake:customer
    name: customer
    params:
      ducklake_connection_string: s3://my-bucket/metadata.ducklake
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_interval: 1h

Limitations

Spice uses DuckDB 1.5.2, which supports DuckLake 1.0. Older DuckLake catalogs require a metadata migration before use. See DuckLake migration guide.
The DuckLake DuckDB extension is downloaded at runtime on first use, requiring network connectivity.
The ducklake_connection_string parameter is required — unlike the catalog connector, it cannot be omitted.
Each dataset creates its own DuckDB connection pool. For querying many tables from the same catalog, consider using the DuckLake Catalog Connector instead, which shares a single connection pool.
Writes are limited to INSERT INTO. UPDATE, DELETE FROM, and DDL (CREATE TABLE, DROP TABLE) are not supported on the data connector — use the DuckLake Catalog Connector for schema operations.

Configuration​

from​

name​

params​

Connection string formats​

Authentication​

AWS S3​

Write Support​

Examples​

Reading from a local DuckLake catalog​

Reading from S3​

Reading from a specific schema​

PostgreSQL metadata backend​

Multiple tables with YAML anchors​

With data acceleration​