Delta Lake Data Connector
Delta Lake data connector connector enables SQL queries from Delta Lake tables.
datasets:
- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
name: my_delta_lake_table
params:
delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
Configuration
from
The from field for the Delta Lake connector takes the form of delta_lake:path where path is any supported path, either local or to a cloud storage location. See the examples section below.
name
The dataset name. This will be used as the table name within Spice.
Example:
datasets:
- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
name: cool_dataset
params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
The dataset name cannot be a [reserved keyword(../../reference/spicepod/keywords.md).
params
Use the secret replacement syntax to reference a secret, e.g. ${secrets:aws_access_key_id}.
| Parameter Name | Description |
|---|---|
client_timeout | Optional. Specifies timeout for object store operations. Default value is 30s. E.g. client_timeout: 60s |
Delta Lake object store parameters
AWS S3
| Parameter Name | Description |
|---|---|
delta_lake_aws_region | Optional. The AWS region for the S3 object store. E.g. us-west-2. |
delta_lake_aws_access_key_id | The access key ID for the S3 object store. |
delta_lake_aws_secret_access_key | The secret access key for the S3 object store. |
delta_lake_aws_endpoint | Optional. The endpoint for the S3 object store. E.g. s3.us-west-2.amazonaws.com. |
delta_lake_aws_allow_http | Optional. Enables insecure HTTP connections to delta_lake_aws_endpoint. Defaults to false. |
Azure Blob
One of the following auth values must be provided for Azure Blob:
delta_lake_azure_storage_account_key,delta_lake_azure_storage_client_idandazure_storage_client_secret, ordelta_lake_azure_storage_sas_key.
| Parameter Name | Description |
|---|---|
delta_lake_azure_storage_account_name | The Azure Storage account name. |
delta_lake_azure_storage_account_key | The Azure Storage master key for accessing the storage account. |
delta_lake_azure_storage_client_id | The service principal client id for accessing the storage account. |
delta_lake_azure_storage_client_secret | The service principal client secret for accessing the storage account. |
delta_lake_azure_storage_sas_key | The shared access signature key for accessing the storage account. |
delta_lake_azure_storage_endpoint | Optional. The endpoint for the Azure Blob storage account. |
Google Storage (GCS)
| Parameter Name | Description |
|---|---|
google_service_account | Filesystem path to the Google service account JSON key file. |
Examples
Delta Lake + Local
- from: delta_lake:/path/to/local/delta/table # A local filesystem path to a Delta Lake table
name: my_delta_lake_table
Delta Lake + S3
- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/ # A reference to a table in S3
name: my_delta_lake_table
params:
delta_lake_aws_region: us-west-2 # Optional
delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
delta_lake_aws_endpoint: s3.us-west-2.amazonaws.com # Optional
Delta Lake + MinIO
- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/ # A reference to a table in MinIO
name: my_delta_lake_table
params:
delta_lake_aws_region: us-east-1 # Best practice for MinIO
delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
delta_lake_aws_endpoint: http://localhost:9000 # MinIO Endpoint
delta_lake_aws_allow_http: true
Delta Lake + Azure Blob
- from: delta_lake:abfss://my_container@my_account.dfs.core.windows.net/path/to/azure/delta/table/ # A reference to a table in Azure Blob
name: my_delta_lake_table
params:
# Account Name + Key
delta_lake_azure_storage_account_name: my_account
delta_lake_azure_storage_account_key: ${secrets:my_key}
# OR Service Principal + Secret
delta_lake_azure_storage_client_id: my_client_id
delta_lake_azure_storage_client_secret: ${secrets:my_secret}
# OR SAS Key
delta_lake_azure_storage_sas_key: my_sas_key
Delta Lake + Google Storage
params:
delta_lake_google_service_account_path: /path/to/service-account.json
Types
The table below shows the Delta Lake data types supported, along with the type mapping to Apache Arrow types in Spice.
| Delta Lake Type | Arrow Type |
|---|---|
String | Utf8 |
Long | Int64 |
Integer | Int32 |
Short | Int16 |
Byte | Int8 |
Float | Float32 |
Double | Float64 |
Boolean | Boolean |
Binary | Binary |
Date | Date32 |
Timestamp | Timestamp(Microsecond, Some("UTC")) |
TimestampNtz | Timestamp(Microsecond, None) |
Decimal | Decimal128 |
Array | List |
Struct | Struct |
Map | Map |
Limitations
-
Delta Lake connector does not support reading Delta tables with the
V2Checkpointfeature enabled. To use the Delta Lake connector with such tables, drop theV2Checkpointfeature by executing the following command:ALTER TABLE <table-name> DROP FEATURE v2Checkpoint [TRUNCATE HISTORY];For more details on dropping Delta table features, refer to the official documentation: Drop Delta table features
Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation(../secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide(../secret-stores#using-secrets).
