Azure BlobFS Data Connector
The Azure BlobFS (ABFS) Data Connector enables federated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (abfss://) and Azure Data Lake (adl://) endpoints.
When a folder path is provided, all the contained files will be loaded.
File formats are specified using the file_format parameter, as described in Object Store File Formats.
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: azure_test
params:
abfs_account: spiceadls
abfs_access_key: ${ secrets:access_key }
file_format: csv
Configuration
from
Defines the ABFS-compatible URI to a folder or object:
from: abfs://<container>/<path>with the account name configured usingabfs_accountparameter, orfrom: abfs://<container>@<account_name>.dfs.core.windows.net/<path>
name
Defines the dataset name, which is used as the table name within Spice.
Example:
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: cool_dataset
params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
The dataset name cannot be a [reserved keyword(../../reference/spicepod/keywords.md).
params
Basic parameters
| Parameter name | Description |
|---|---|
file_format | Specifies the data format. Required if not inferrable from from. Options: parquet, csv. Refer to Object Store File Formats for details. |
abfs_account | Azure storage account name |
abfs_sas_string | SAS (Shared Access Signature) Token to use for authorization |
abfs_endpoint | Storage endpoint, default: https://{account}.blob.core.windows.net |
abfs_use_emulator | Use true or false to connect to a local emulator |
abfs_authority_host | Alternative authority host, default: https://login.microsoftonline.com |
abfs_proxy_url | Proxy URL |
abfs_proxy_ca_certificate | CA certificate for the proxy |
abfs_proxy_exludes | A list of hosts to exclude from proxy connections |
abfs_disable_tagging | Disable tagging objects. Use this if your backing store doesn't support tags |
allow_http | Allow insecure HTTP connections |
hive_partitioning_enabled | Enable partitioning using hive-style partitioning from the folder structure. Defaults to false |
schema_source_path | Specifies the URL used to infer the dataset schema. Default to the most recently modified file |
Authentication parameters
The following parameters are used when authenticating with Azure. Only one of these parameters can be used at a time:
abfs_access_keyabfs_bearer_tokenabfs_client_secretabfs_skip_signature
If none of these are set the connector will default to using a managed identity
| Parameter name | Description |
|---|---|
abfs_access_key | Secret access key |
abfs_bearer_token | BEARER access token for user authentication. The token can be obtained from the OAuth2 flow (see access token authentication). |
abfs_client_id | Client ID for client authentication flow |
abfs_client_secret | Client Secret to use for client authentication flow |
abfs_tenant_id | Tenant ID to use for client authentication flow |
abfs_skip_signature | Skip credentials and request signing for public containers |
abfs_msi_endpoint | Endpoint for managed identity tokens |
abfs_federated_token_file | File path for federated identity token in Kubernetes |
abfs_use_cli | Set to true to use the Azure CLI to acquire access tokens |
Retry parameters
| Parameter name | Description |
|---|---|
abfs_max_retries | Maximum retries |
abfs_retry_timeout | Total timeout for retries (e.g., 5s, 1m) |
abfs_backoff_initial_duration | Initial retry delay (e.g., 5s) |
abfs_backoff_max_duration | Maximum retry delay (e.g., 1m) |
abfs_backoff_base | Exponential backoff base (e.g., 0.1) |
Authentication
ABFS connector supports three types of authentication, as detailed in the authentication parameters
Service principal authentication
Configure service principal authentication by setting the abfs_client_secret parameter.
- Create a new Azure AD application in the Azure portal and generate a
client secretunderCertificates & secrets. - Grant the Azure AD application read access to the storage account under
Access Control (IAM), this can typically be done using theStorage Blob Data Readerbuilt-in role.
Access key authentication
Configure service principal authentication by setting the abfs_access_key parameter to Azure Storage Account Access Key
Access token authentication
Configure access token authentication by setting the abfs_bearer_token parameter, typically obtained through the following the OAuth2 flow with spice login abfs.
- Create a new Azure AD application in the Azure portal.
- Under the application's
API permissions, add the permission:Azure Storage - user_impersonation. - Under the applications's
Authentication, addhttp://localhostas Mobile and desktop applications redirect URI. - Grant the user read access to the storage account under
Access Control (IAM), this can typically be done using theStorage Blob Data Readerbuilt-in role. - Obtain the
abfs_bearer_tokenusing the following command. Theabfs_bearer_token,abfs_client_id,abfs_tenant_idwill be automatically filled in environment secret after login. Refere to [spice login(../../cli/reference/login) documentation for more details.
spice login abfs --tenant-id $TENANT_ID --client-id $CLIENT_ID
Supported file formats
Specify the file format using file_format parameter. More details in Object Store File Formats.
Examples
Reading a CSV file with an Access Key
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: azure_test
params:
abfs_account: spiceadls
abfs_access_key: ${ secrets:ACCESS_KEY }
file_format: csv
Using Public Containers
datasets:
- from: abfs://pubcontainer/taxi_sample.csv
name: pub_data
params:
abfs_account: spiceadls
abfs_skip_signature: true
file_format: csv
Connecting to the Storage Emulator
datasets:
- from: abfs://test_container/test_csv.csv
name: test_data
params:
abfs_use_emulator: true
file_format: csv
Using secrets for Account name
datasets:
- from: abfs://my_container/my_csv.csv
name: prod_data
params:
abfs_account: ${ secrets:PROD_ACCOUNT }
file_format: csv
Authenticating using Client Authentication
datasets:
- from: abfs://my_data/input.parquet
name: my_data
params:
abfs_tenant_id: ${ secrets:MY_TENANT_ID }
abfs_client_id: ${ secrets:MY_CLIENT_ID }
abfs_client_secret: ${ secrets:MY_CLIENT_SECRET }
Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation(../secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide(../secret-stores#using-secrets).
