Skip to main content
Version: Next

Databricks Deployment Guide

Production operating guide for the Databricks connector covering resilience tuning, Unity Catalog awareness, metrics, and observability. These features apply primarily to sql_warehouse mode unless noted otherwise.

Resilience Controls

Retry and Concurrency Parameters

When using mode: sql_warehouse, the following parameters control HTTP retry behavior and concurrency limits for the Databricks SQL Statements API.

ParameterTypeDefaultDescription
max_concurrent_requestsinteger8Maximum concurrent HTTP requests to the SQL Warehouse API.
http_max_retriesinteger3Maximum HTTP-level retries for transient failures (429, 5xx).
backoff_methodstringfibonacciBackoff strategy for transient HTTP retries: fibonacci or exponential.
statement_max_retriesinteger14Maximum poll retries when waiting for an async SQL statement to complete.
disable_on_permanent_errorbooleantruePermanently disable the connector on non-retryable errors (401, 403, 404).

Example

catalogs:
- from: databricks:my_catalog
name: my_catalog
params:
endpoint: my-workspace.cloud.databricks.com
mode: sql_warehouse
sql_warehouse_id: abc123def456
databricks_client_id: ${env:DBX_CLIENT_ID}
databricks_client_secret: ${env:DBX_CLIENT_SECRET}
max_concurrent_requests: '4'
http_max_retries: '5'
backoff_method: exponential
statement_max_retries: '20'
disable_on_permanent_error: 'true'

Shared Concurrency Semaphore

When multiple datasets or catalog-discovery paths target the same SQL Warehouse (same endpoint + sql_warehouse_id), a single concurrency semaphore is shared across all of them. The max_concurrent_requests limit is enforced globally for that warehouse, not per dataset or per catalog.

The max_concurrent_requests value only needs to be set on one dataset or catalog entry for a given warehouse — other components targeting the same warehouse that omit the parameter will share the same semaphore with the configured limit. If multiple components explicitly set max_concurrent_requests, the values must match; conflicting values are treated as a configuration error.

Permanent-Disable Behavior

When disable_on_permanent_error is true (default), non-retryable HTTP status codes on statement-execution requests permanently disable the connector. Subsequent queries immediately return a PermanentlyDisabled error instead of issuing further HTTP requests.

The following errors trigger permanent disable:

  • 401 Unauthorized — expired or invalid credentials.
  • 403 Forbidden — the service principal or token lacks permission to execute statements on the warehouse.
  • 404 Not Found — the SQL Warehouse has been deleted or the endpoint is incorrect.

This prevents cascading failures (e.g., every dataset refresh hammering a warehouse that will never accept the request).

info

Permanent-disable detection is not applied to statement-poll or result-fetch requests. Transient 403/404 responses on those paths (e.g., expired pre-signed URLs or purged statement results) do not indicate a configuration problem.

To recover from a permanent-disable state, fix the underlying issue (e.g., renew credentials, restore the warehouse) and restart the Spice runtime.

Retry Behavior

The SQL Warehouse connector has two retry layers:

  1. HTTP-level retries retries on 408 (request timeout), 429 (rate-limit), and 5xx (server error) responses, as well as transient network and connection errors. Respects Retry-After, retry-after-ms, and x-retry-after-ms headers. Uses the configured backoff_method with a maximum backoff of 300 seconds.

  2. Statement poll retries when a SQL statement enters PENDING or RUNNING state, the connector polls for completion using fibonacci backoff up to statement_max_retries times. If the statement does not reach a terminal state within the retry budget, a QueryStillRunning or InvalidWarehouseState error is returned.

Unity Catalog Awareness

Table Type Filtering

The connector checks each table's type against Unity Catalog metadata before creating a table provider. The following table types are supported:

Table TypeSupportedNotes
MANAGEDYesStandard Delta tables
EXTERNALYesTables with external storage locations
FOREIGNYesLakehouse Federation foreign tables
MATERIALIZED_VIEWYesMaterialized views
VIEWNoSkipped during discovery
STREAMING_TABLENoSkipped during discovery

Unsupported table types are silently skipped during catalog discovery. When referenced directly (e.g., databricks:catalog.schema.view_name), an error is returned.

Permission Checking

Before creating a table provider, the connector verifies the current principal has a read-compatible privilege on the table using the Unity Catalog Effective Permissions API. The following privileges grant read access: SELECT, ALL_PRIVILEGES, ALL PRIVILEGES, OWNER, and OWNERSHIP.

Catalog discovery: Tables without read permissions are skipped.

Direct table references: An InsufficientPermissions error is returned.

Foreign tables: FOREIGN tables skip the table-level permission precheck because Lakehouse Federation access can be valid even when the effective-permissions endpoint does not report a table-level read privilege. Access is still enforced by Databricks at query time.

Graceful degradation: If the Unity Catalog API is unreachable or the table is not found in UC, the connector logs a warning and proceeds without validation.

Metrics

The SQL Warehouse connector exposes per-dataset operational metrics. Most metrics must be explicitly enabled in the dataset's metrics section. The inflight_operations metric is auto-registered and always available.

For general information about component metrics, see Component Metrics.

Available Metrics

Metric NameTypeCategoryDescription
requests_totalCounterRequestsTotal HTTP requests issued (excluding retries).
retries_totalCounterRequestsTotal HTTP retries for transient failures.
permanent_errors_totalCounterRequestsTotal non-retryable errors (401, 403, 404).
inflight_operationsGaugeRequestsCurrent in-flight operations holding a concurrency permit. Global across datasets sharing the same warehouse. Auto-registered.
statements_executed_totalCounterStatementsTotal SQL statements submitted.
statement_polls_totalCounterStatementsTotal polls for async statement completion.
statements_failed_totalCounterStatementsTotal SQL statements that completed with FAILED status.
pool_connections_totalCounterConnection PoolTotal pool connect() calls.
pool_active_connectionsGaugeConnection PoolCurrent active connection handles.
semaphore_available_permitsGaugeConcurrencyAvailable permits in the request concurrency semaphore.
chunks_fetched_totalCounterData TransferTotal Arrow result chunks fetched.
connector_disabledGaugeConnector StateWhether the connector is permanently disabled (1 = yes, 0 = no).

Enabling Metrics

Add a metrics list to the dataset definition in your spicepod:

datasets:
- from: databricks:my_catalog.my_schema.my_table
name: my_table
params:
mode: sql_warehouse
sql_warehouse_id: abc123def456
endpoint: my-workspace.cloud.databricks.com
databricks_client_id: ${env:DBX_CLIENT_ID}
databricks_client_secret: ${env:DBX_CLIENT_SECRET}
metrics:
- name: requests_total
- name: retries_total
- name: permanent_errors_total
- name: statements_executed_total
- name: statements_failed_total
- name: pool_active_connections
- name: semaphore_available_permits
- name: chunks_fetched_total
- name: connector_disabled

Individual metrics can be disabled by setting enabled: false. This includes auto-registered metrics:

    metrics:
- name: inflight_operations
enabled: false

Metric Naming

Metrics are exposed as OpenTelemetry instruments with the naming convention:

dataset_databricks_{metric_name}

For example, requests_total becomes dataset_databricks_requests_total. Each instrument carries a name attribute set to the dataset instance name, so metrics from multiple datasets sharing the same warehouse can be distinguished.

Shared Warehouse Attribution

When multiple datasets share the same SQL Warehouse, compare dataset_databricks_* metrics by their name attribute to understand per-dataset load. The semaphore_available_permits metric reflects the shared semaphore, so all datasets targeting the same warehouse observe the same underlying concurrency budget.

Accessing Metrics

Registered metrics are available through:

  • Prometheus endpoint: GET /metrics when the metrics server is enabled.
  • runtime.metrics SQL table: SELECT * FROM runtime.metrics WHERE name LIKE 'dataset_databricks_%'.
  • OTLP push exporter: Pushed to any configured OpenTelemetry collector.

Task History

All major Databricks operations are instrumented with tracing spans for the Spice task history system. This applies to both sql_warehouse and delta_lake modes.

SQL Warehouse Spans

Span NameInput FieldDescription
databricks_get_schemaTable nameSchema inference via information_schema or DESCRIBE
databricks_execute_statementSQL textSQL statement execution via the Statements API
databricks_poll_statementStatement IDPolling for async statement completion

Unity Catalog Spans

Span NameInput FieldDescription
uc_get_tableFully-qualified table nameFetch table metadata from Unity Catalog
uc_get_catalogCatalog IDFetch catalog metadata
uc_list_schemasCatalog IDList schemas in a catalog
uc_list_tablescatalog_id.schema_nameList tables in a schema
uc_get_effective_permissionsFully-qualified table nameCheck effective permissions for a table

All SQL Warehouse spans include a warehouse_id field. Unity Catalog spans include the table or catalog identifier as the input field.

Token Management

How authentication tokens are managed depends on the authentication method:

  • Service Principal (M2M OAuth): A background task refreshes the OAuth2 token 5 minutes before expiry. Refresh failures use fibonacci backoff capped at 5 minutes.
  • Personal Access Token: Used as-is with no automatic refresh.