Version: Next

GCS Data Connector

The GCS Data Connector enables federated SQL queries on files stored in Google Cloud Storage. Both gcs:// and gs:// URI schemes are accepted.

When a folder path is provided, all the contained files will be loaded.

File formats are specified using the file_format parameter, as described in File Formats.

datasets:
  - from: gs://my-bucket/taxi_sample.csv
    name: gcs_test
    params:
      gcs_service_account_path: /etc/spice/gcs-key.json
      file_format: csv

Configuration

`from`

Defines the GCS URI to a folder or object. Both schemes are supported and equivalent:

from: gs://<bucket>/<path>
from: gcs://<bucket>/<path>

Example: from: gs://my-bucket/path/to/file.parquet

`name`

Defines the dataset name, which is used as the table name within Spice.

Example:

datasets:
  - from: gs://my-bucket/taxi_sample.csv
    name: cool_dataset
    params:
      file_format: csv

SELECT COUNT(*) FROM cool_dataset;

+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

The dataset name cannot be a reserved keyword.

`params`

Basic parameters

Parameter name	Description
`file_format`	Specifies the data format. Required if it cannot be inferred from the object URI. Options: `parquet`, `csv`, `json`. Refer to File Formats for details.
`allow_http`	Allow insecure HTTP connections. Defaults to `false`.
`client_timeout`	Optional. Timeout for GCS client operations.
`hive_partitioning_enabled`	Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false`.
`schema_source_path`	Specifies the URL used to infer the dataset schema. Defaults to the most recently modified file.

Authentication parameters

The following authentication methods are mutually exclusive — only one can be set at a time. The runtime will fail to start if more than one is specified.

gcs_service_account_path
gcs_service_account_key
gcs_application_default_credentials
gcs_skip_signature

If none of these are set, the connector accesses the bucket without explicit credentials. For public buckets, set gcs_skip_signature: true to skip request signing.

Parameter name	Description
`gcs_service_account_path`	Path to a GCS service account JSON key file.
`gcs_service_account_key`	GCS service account JSON key as a string.
`gcs_application_default_credentials`	Set to `true` to use Google Application Default Credentials. If `GOOGLE_APPLICATION_CREDENTIALS` is set, that path is used. Defaults to `false`.
`gcs_skip_signature`	Set to `true` to skip signing requests. Use for public buckets.

Retry parameters

Parameter name	Description
`gcs_max_retries`	Maximum number of retries. Defaults to `3`.
`gcs_retry_timeout`	Total timeout for retries (e.g., `5s`, `1m`).
`gcs_backoff_initial_duration`	Initial retry delay (e.g., `5s`).
`gcs_backoff_max_duration`	Maximum retry delay (e.g., `1m`).
`gcs_backoff_base`	Exponential backoff base (e.g., `0.1`).

Authentication

GCS connector supports four mutually-exclusive authentication modes, as detailed in the authentication parameters.

Service account JSON file

Configure a service account by setting gcs_service_account_path to the file path of a downloaded service account JSON key:

datasets:
  - from: gs://my-bucket/data/
    name: my_data
    params:
      gcs_service_account_path: /etc/spice/gcs-key.json
      file_format: parquet

To create the key file, follow the Google Cloud documentation for service account keys and grant the service account roles/storage.objectViewer (or higher) on the bucket via the Cloud Storage IAM settings.

Service account JSON content

When mounting a key file is not practical (e.g., when keying off a secret store), pass the JSON contents directly via gcs_service_account_key:

datasets:
  - from: gs://my-bucket/data/
    name: my_data
    params:
      gcs_service_account_key: ${secrets:GCS_SERVICE_ACCOUNT_JSON}
      file_format: parquet

The value should be the full JSON key as a single string, ideally provided through a supported secret store.

Application Default Credentials (ADC)

To use Application Default Credentials — for example, when running inside Google Cloud with attached service accounts (GKE Workload Identity, Compute Engine metadata, etc.) or when using gcloud auth application-default login locally — set gcs_application_default_credentials: true:

datasets:
  - from: gs://my-bucket/data/
    name: my_data
    params:
      gcs_application_default_credentials: true
      file_format: parquet

If the GOOGLE_APPLICATION_CREDENTIALS environment variable is set to a service account JSON key path, that file is used. Otherwise, the ADC chain searches the well-known locations described in the Google Cloud documentation.

Public buckets

For unauthenticated access to a public bucket, set gcs_skip_signature: true:

datasets:
  - from: gs://public-bucket/data/
    name: public_data
    params:
      gcs_skip_signature: true
      file_format: parquet

Supported file formats

Specify the file format using the file_format parameter. More details in File Formats.

Examples

Reading a Parquet folder with a service account key file

datasets:
  - from: gs://my-bucket/trips/2024/
    name: taxi_trips
    params:
      gcs_service_account_path: /etc/spice/gcs-key.json
      file_format: parquet

Reading a CSV file with the service account JSON inlined from a secret

datasets:
  - from: gs://my-bucket/taxi_sample.csv
    name: taxi_sample
    params:
      gcs_service_account_key: ${secrets:GCS_SERVICE_ACCOUNT_JSON}
      file_format: csv

Reading from a public bucket

datasets:
  - from: gs://public-bucket/sample.parquet
    name: sample
    params:
      gcs_skip_signature: true
      file_format: parquet

Hive-partitioned dataset

datasets:
  - from: gs://my-bucket/events/
    name: events
    params:
      gcs_application_default_credentials: true
      file_format: parquet
      hive_partitioning_enabled: true

Secrets

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation.

gcs_service_account_path and gcs_service_account_key are marked as secrets and can be supplied through any supported secret store using the ${secrets:KEY} replacement syntax.

Configuration​

from​

name​

params​

Basic parameters​

Authentication parameters​

Retry parameters​

Authentication​

Service account JSON file​

Service account JSON content​

Application Default Credentials (ADC)​

Public buckets​

Supported file formats​

Examples​

Reading a Parquet folder with a service account key file​

Reading a CSV file with the service account JSON inlined from a secret​

Reading from a public bucket​

Hive-partitioned dataset​

Secrets​