Catalog Connectors
In Spice, datasets are organized hierarchically with catalogs, schemas, and tables. A catalog, at the top level, contains multiple schemas. Each schema, in turn, contains multiple tables where the actual data is stored. By default a catalog named spice
is created with all of the datasets defined in the datasets
section of the Spicepod.
Creating schemas and tables within the spice
catalog is configured by the name
field in the dataset configuration. A name with a period (.
) will create schema, i.e. a dataset defined with name: foo.bar
would have a full path of spice.foo.bar
. If the name does not contain a period, the dataset will be created in the public
schema of the spice
catalog. For example, a dataset defined with name: foo
would have a full path of spice.public.foo
. Attempting to create a dataset with a name that contains a catalog name will result in an error. Adding catalogs to Spice is done via Catalog Connectors.
Catalog Connectors connect to external catalog providers and make their tables available for federated SQL query in Spice. Configuring accelerations for tables in external catalogs is not supported. The schema hierarchy of the external catalog is preserved in Spice.
Supported Catalog Connectors include:
Name | Description | Status | Protocol/Format |
---|---|---|---|
unity_catalog | Unity Catalog | Stable | Delta Lake |
databricks | Databricks | Beta | Spark Connect, S3/Delta Lake |
iceberg | Apache Iceberg | Beta | Parquet |
spice.ai | Spice.ai Cloud Platform | Alpha | Arrow Flight |
glue | AWS Glue | Coming Soon | JSON, Parquet, Iceberg |
Catalog Connector Docs
Catalog are configured using a Catalog Connector in the catalogs
section of the Spicepod. See the specific Catalog Connector documentation for configuration details.
include
Use the include
field to specify which tables to include from the catalog. The include
field supports glob patterns to match multiple tables. For example, *.my_table_name
would include all tables with the name my_table_name
in the catalog from any schema. Multiple include
patterns are OR'ed together and can be specified to include multiple tables.
Example:
catalogs:
- from: spice.ai
name: spiceai
include:
- 'tpch.*' # Include only the "tpch" tables.
📄️ Databricks
Connect to a Databricks Unity Catalog provider.
📄️ Unity Catalog
Connect to a Unity Catalog provider.
📄️ Spice.ai
Connect to the Spice.ai built-in catalog.
📄️ Iceberg
Connect to an Iceberg catalog provider.