Skip to main content

Iceberg Data Connector

The Iceberg Data Connector enables federated SQL querying on Apache Iceberg tables.

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: my_table

Configuration​

from​

The from field specifies the Iceberg table to connect to, in the format iceberg:<table_path>. The table_path is the URL to the Iceberg table in the catalog provider. It is formatted as http[s]://<iceberg_catalog_host>/v1/{prefix}/namespaces/<namespace_name>/tables/<table_name>.

For AWS Glue catalogs, the URL format is https://glue.<region>.amazonaws.com/iceberg/v1/catalogs/<account_id>/namespaces/<namespace_name>/tables/<table_name>, where <account_id> is your AWS account ID.

Example: from: iceberg:http://localhost:8181/v1/namespaces/my_namespace/tables/my_table

name​

The dataset name. This will be used as the table name within Spice.

Example:

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: transactions
params:
iceberg_token: ${secrets:iceberg_token}
SELECT COUNT(*) FROM transactions;
+----------+
| count(*) |
+----------+
| 1234567 |
+----------+

params​

Parameter NameDescription
iceberg_tokenBearer token value to use for Authorization header.
iceberg_oauth2_credentialCredential to use for OAuth2 client credential flow when connecting to the table. Format: <client_id>:<client_secret>
iceberg_oauth2_scopeScope to use for OAuth2 client credential flow when connecting to the table. Default: catalog
iceberg_oauth2_server_urlURL of the OAuth2 server tokens endpoint for the client credential flow.
iceberg_s3_endpointS3-compatible endpoint where the Iceberg table data is stored.
iceberg_s3_regionRegion of the S3-compatible endpoint.
iceberg_s3_access_key_idAccess key ID for the S3-compatible endpoint.
iceberg_s3_secret_access_keySecret access key for the S3-compatible endpoint.
iceberg_s3_session_tokenSession token for the S3-compatible endpoint.
iceberg_s3_role_arnARN of the IAM role to assume when accessing the S3-compatible endpoint.
iceberg_s3_role_session_nameSession name to use when assuming the IAM role.
iceberg_s3_connect_timeoutConnection timeout in seconds for the S3-compatible endpoint. Default: 60
iceberg_sigv4_enabledEnable SigV4 (AWS Signature Version 4) authentication when connecting to the catalog. Automatically enabled if the URL in from is an AWS Glue catalog. Default: false
iceberg_signing_regionRegion to use for SigV4 authentication. Extracted from the URL in from if not specified.
iceberg_signing_nameService name to use for SigV4 authentication. Default: glue.

Authentication​

Authentication to the Iceberg catalog can be done using various methods:

  1. Bearer Token: Use iceberg_token to provide a bearer token for the Authorization header.

  2. OAuth2 Client Credentials Flow: Use iceberg_oauth2_credential, iceberg_oauth2_scope, and iceberg_oauth2_server_url to authenticate using OAuth2 client credentials flow.

  3. AWS SigV4: For AWS Glue catalogs, set iceberg_sigv4_enabled to true (automatically enabled for AWS Glue URLs).

  4. S3 Authentication: For accessing the underlying data in S3, use the iceberg_s3_* parameters to configure S3 access.

Examples​

Basic Example​

Connect to an Iceberg table with token authentication:

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: my_table
params:
iceberg_token: ${secrets:iceberg_token}

AWS Glue Catalog Example​

Connect to an Iceberg table in AWS Glue catalog:

datasets:
- from: iceberg:https://glue.us-east-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/my_namespace/tables/my_table
name: glue_table
params:
iceberg_sigv4_enabled: true

OAuth2 Authentication Example​

Connect to an Iceberg table using OAuth2 authentication:

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: oauth_table
params:
iceberg_oauth2_credential: ${secrets:client_id}:${secrets:client_secret}
iceberg_oauth2_scope: catalog
iceberg_oauth2_server_url: https://iceberg-catalog-host.com/oauth2/token

S3 Storage Example​

Connect to an Iceberg table with custom S3 storage configuration:

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: s3_table
params:
iceberg_token: ${secrets:iceberg_token}
iceberg_s3_endpoint: http://localhost:9000
iceberg_s3_region: us-west-2
iceberg_s3_access_key_id: ${secrets:aws_access_key_id}
iceberg_s3_secret_access_key: ${secrets:aws_secret_access_key}

Secrets​

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation. Additionally, learn how to use referenced secrets in component parameters by visiting the using referenced secrets guide.

Limitations​

Performance Considerations

When querying Iceberg tables, performance depends on the size of the table, the complexity of the query, and the underlying storage system. For large tables, consider using appropriate filtering to limit the amount of data scanned.

The connector needs to access both the Iceberg catalog metadata and the underlying data files (typically stored in S3 or a compatible object store). Ensure proper network connectivity and authentication for both systems.