Version: Latest (v1.11)

AWS Integrations

Spice.ai and AWS

Spice.ai provides deep integrations with Amazon Web Services (AWS), enabling data federation, AI inference, vector search, and secure secret management across the AWS ecosystem. This page consolidates all AWS-compatible components and provides quick access to configuration guides.

Data Connectors

Data connectors federate SQL queries across AWS data sources without data movement.

Connector	Description	Documentation
Amazon S3	Query Parquet, CSV, and JSON files stored in S3 buckets. Supports private buckets with IAM authentication and S3-compatible storage like MinIO.	S3 Data Connector
Amazon S3 Tables	Query Iceberg tables in Amazon S3 Tables using the Glue connector with S3 Tables catalog format.	Glue Data Connector
Amazon DynamoDB	Federated SQL queries on DynamoDB tables with automatic schema inference.	DynamoDB Data Connector
Amazon DynamoDB Streams	Real-time CDC streaming of table changes via DynamoDB Streams.	DynamoDB Data Connector
Amazon Redshift	Connect to Redshift clusters using the PostgreSQL-compatible connector.	Redshift Data Connector
Amazon Aurora PostgreSQL	Connect to Aurora PostgreSQL clusters using the PostgreSQL connector.	PostgreSQL Data Connector
Amazon Aurora MySQL	Connect to Aurora MySQL clusters using the MySQL connector.	MySQL Data Connector
Amazon RDS PostgreSQL	Connect to RDS PostgreSQL instances using the PostgreSQL connector.	PostgreSQL Data Connector
Amazon RDS MySQL	Connect to RDS MySQL instances using the MySQL connector.	MySQL Data Connector
Amazon MSK	Stream data from Amazon MSK (Managed Streaming for Apache Kafka) topics using the Kafka connector.	Kafka Data Connector
Debezium (Amazon MSK)	Change Data Capture (CDC) from databases via Debezium running on Amazon MSK for real-time dataset updates.	Debezium Data Connector
AWS Glue Data Catalog	Query Iceberg tables registered in AWS Glue.	Glue Data Connector
Apache Iceberg (AWS)	Query Iceberg tables stored in S3 with Glue or REST catalog metadata.	Iceberg Data Connector
Delta Lake (S3)	Query Delta Lake tables stored in Amazon S3.	Delta Lake Data Connector
AWS Athena (ODBC)	Connect to Athena using the ODBC connector with Athena SQL dialect support.	ODBC Data Connector

Example: Amazon S3

datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    params:
      file_format: parquet
      s3_region: us-east-1
      s3_auth: iam_role  # Uses IAM credentials from environment

Example: DynamoDB

datasets:
  - from: dynamodb:users
    name: users
    params:
      dynamodb_aws_region: us-west-2

Example: AWS Glue with Amazon S3 Tables

datasets:
  - from: glue:my_namespace.orders
    name: orders
    params:
      glue_catalog_id: 123635965758:s3tablescatalog/my-table-bucket
      glue_region: us-east-2

Catalog Connectors

Catalog connectors provide schema discovery and unified access to tables in AWS data catalogs.

Connector	Description	Documentation
AWS Glue Catalog	Discover and query tables from AWS Glue Data Catalog with glob pattern filtering.	Glue Catalog Connector

Example: Glue Catalog

catalogs:
  - from: glue
    name: my_data_lake
    include:
      - '*.*'  # Include all tables from all databases
    params:
      glue_region: us-east-1

AI Models (Amazon Bedrock)

Spice integrates with Amazon Bedrock for large language model inference, supporting Amazon Nova and other foundation models.

Provider	Supported Models	Documentation
Amazon Bedrock	Amazon Nova (Micro, Lite, Pro, Premier), cross-region inference profiles	Bedrock Models

Example: Amazon Nova

models:
  - from: bedrock:us.amazon.nova-lite-v1:0
    name: nova
    params:
      aws_region: us-east-1

Guardrails Support

Bedrock Guardrails can filter model inputs and outputs:

models:
  - from: bedrock:amazon.nova-pro-v1:0
    name: nova-guarded
    params:
      aws_region: us-east-1
      bedrock_guardrail_identifier: arn:aws:bedrock:us-east-1:123456789012:guardrail/abc123
      bedrock_guardrail_version: '1'

Embeddings (Amazon Bedrock)

Generate vector embeddings using Amazon Bedrock embedding models for semantic search and RAG applications.

Provider	Supported Models	Documentation
Amazon Bedrock	Amazon Titan Embeddings, Amazon Nova Multimodal Embeddings, Cohere Embed	Bedrock Embeddings

Example: Amazon Titan Embeddings

embeddings:
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan
    params:
      aws_region: us-east-1
      dimensions: '256'

Example: Amazon Nova Multimodal Embeddings

embeddings:
  - from: bedrock:amazon.nova-2-multimodal-embeddings-v1:0
    name: nova_embed
    params:
      dimensions: '1024'
      truncation_mode: START
      embedding_purpose: GENERIC_RETRIEVAL
      aws_region: us-east-1

Vector Stores (Amazon S3 Vectors)

Amazon S3 Vectors is a new S3 bucket type for storing and querying vector embeddings at scale. Spice integrates S3 Vectors as a vector index backend for hybrid search applications.

Engine	Description	Documentation
Amazon S3 Vectors	Sub-second similarity queries on billions of vectors with up to 90% cost reduction compared to traditional vector databases.	S3 Vectors Engine

Example: S3 Vectors with Bedrock Embeddings

datasets:
  - from: oracle:"CUSTOMER_REVIEWS"
    name: reviews
    vectors:
      enabled: true
      engine: s3_vectors
      params:
        s3_vectors_bucket: my-s3-vector-bucket
        s3_vectors_aws_region: us-east-1
    columns:
      - name: body
        embeddings:
          from: bedrock_titan

embeddings:
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: bedrock_titan
    params:
      aws_region: us-east-1
      dimensions: '256'

Data Accelerators (S3 Express One Zone)

Spice Cayenne data accelerator supports AWS S3 Express One Zone for storing accelerated data with single-digit millisecond latency. This is ideal for latency-sensitive query workloads that require persistent storage while maintaining fast access.

Storage Recommendation

For best performance, store Cayenne data files on local NVMe storage. Use S3 Express One Zone only when persistence of accelerations is required, such as preserving accelerated data across restarts or sharing data between multiple Spice instances.

Accelerator	Description	Documentation
Spice Cayenne	High-performance data accelerator using Vortex file format with S3 Express One Zone for sub-10ms latency query performance.	Cayenne Accelerator

Why S3 Express One Zone?

S3 Express One Zone directory buckets provide:

Single-digit millisecond latency: 10x faster than S3 Standard for first-byte latency
High request throughput: Up to 10x higher request rates than S3 Standard
Cost efficiency: Lower per-request costs for high-frequency access patterns
Durability: Same 99.999999999% (11 9s) durability as S3 Standard

Example: Cayenne with S3 Express One Zone

datasets:
  - from: s3://source-bucket/events/
    name: analytics_events
    acceleration:
      engine: cayenne
      enabled: true
      mode: file
      params:
        # Store accelerated data in S3 Express One Zone bucket
        cayenne_file_path: s3://my-bucket--usw2-az1--x-s3/cayenne/
        cayenne_s3_region: us-west-2

Example: Auto-generated Bucket with IAM Role

datasets:
  - from: postgresql://db/events
    name: fast_events
    acceleration:
      engine: cayenne
      enabled: true
      mode: file
      params:
        # Auto-generates bucket: spice-{spicepod-name}-fast_events--usw2-az1--x-s3
        cayenne_s3_zone_ids: usw2-az1

Supported AWS Regions

S3 Express One Zone is available in select regions. Spice automatically derives the region from zone IDs:

Zone ID Prefix	Region
`use1`	us-east-1
`use2`	us-east-2
`usw1`	us-west-1
`usw2`	us-west-2
`euw1`	eu-west-1
`euc1`	eu-central-1
`apne1`	ap-northeast-1
`apse1`	ap-southeast-1

See AWS documentation for the complete list of S3 Express One Zone availability zones.

Secret Management

Securely store and retrieve credentials using AWS Secrets Manager.

Store	Description	Documentation
AWS Secrets Manager	Read secrets from AWS Secrets Manager by secret name.	AWS Secrets Manager

Example: Using Secrets Manager

secrets:
  - from: aws_secrets_manager:my_database_creds
    name: db

datasets:
  - from: postgres:public.users
    name: users
    params:
      pg_host: ${db:host}
      pg_user: ${db:username}
      pg_pass: ${db:password}

Authentication

All AWS integrations support the standard AWS SDK credential chain. When credentials are not explicitly configured, Spice loads them from the following sources in order:

Environment Variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
Shared Credentials Files: ~/.aws/credentials and ~/.aws/config
AWS SSO Sessions: Configured via aws configure sso
Web Identity Token: For OIDC/OAuth (common with EKS IRSA)
ECS Container Credentials: Automatic IAM role for ECS tasks
EC2 Instance Metadata (IMDSv2): Automatic IAM role for EC2 instances

IAM Permissions

Ensure the IAM role or user has appropriate permissions for all AWS services used:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "dynamodb:Scan",
        "dynamodb:DescribeTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "bedrock:InvokeModel",
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "*"
    }
  ]
}

Deployment Options

Deploy Spice on AWS infrastructure for optimal performance and integration:

Option	Description	Documentation
Amazon EKS	Kubernetes orchestration with Helm chart deployment	AWS Deployment
Amazon ECS	Container service with Fargate or EC2 launch types	AWS Deployment
Amazon EC2	Direct deployment with Docker or binary	AWS Deployment

Resources

Marketplace

Spice.ai on AWS Marketplace - Deploy Spice.ai from AWS Marketplace

Quick Start

Get started with Spice on AWS in minutes:

Install Spice CLI:

curl https://install.spiceai.org | /bin/bash

Configure AWS credentials:

aws configure

Create a Spicepod with S3 data:

# spicepod.yaml
version: v1beta1
kind: Spicepod
name: aws_quickstart

datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    params:
      file_format: parquet
      s3_auth: iam_role

Start the runtime:

spice run

Query your data:

spice sql
> SELECT COUNT(*) FROM taxi_trips;

Data Connectors​

Example: Amazon S3​

Example: DynamoDB​

Example: AWS Glue with Amazon S3 Tables​

Catalog Connectors​

Example: Glue Catalog​

AI Models (Amazon Bedrock)​

Example: Amazon Nova​

Guardrails Support​

Embeddings (Amazon Bedrock)​

Example: Amazon Titan Embeddings​

Example: Amazon Nova Multimodal Embeddings​

Vector Stores (Amazon S3 Vectors)​

Example: S3 Vectors with Bedrock Embeddings​

Data Accelerators (S3 Express One Zone)​

Why S3 Express One Zone?​

Example: Cayenne with S3 Express One Zone​

Example: Auto-generated Bucket with IAM Role​

Supported AWS Regions​

Secret Management​

Example: Using Secrets Manager​

Authentication​

IAM Permissions​

Deployment Options​

Resources​

AWS Blog Posts​

Spice.ai Blog Posts​

Videos​

Marketplace​

Quick Start​