AWS Integrations

Spice.ai provides deep integrations with Amazon Web Services (AWS), enabling data federation, AI inference, vector search, and secure secret management across the AWS ecosystem. This page consolidates all AWS-compatible components and provides quick access to configuration guides.
Data Connectors
Data connectors federate SQL queries across AWS data sources without data movement.
| Connector | Description | Documentation |
|---|---|---|
| Amazon S3 | Query Parquet, CSV, and JSON files stored in S3 buckets. Supports private buckets with IAM authentication and S3-compatible storage like MinIO. | S3 Data Connector |
| Amazon S3 Tables | Query Iceberg tables in Amazon S3 Tables using the Glue connector with S3 Tables catalog format. | Glue Data Connector |
| Amazon DynamoDB | Federated SQL queries on DynamoDB tables with automatic schema inference. | DynamoDB Data Connector |
| Amazon DynamoDB Streams | Real-time CDC streaming of table changes via DynamoDB Streams. | DynamoDB Data Connector |
| Amazon Redshift | Connect to Redshift clusters using the PostgreSQL-compatible connector. | Redshift Data Connector |
| Amazon Aurora PostgreSQL | Connect to Aurora PostgreSQL clusters using the PostgreSQL connector. | PostgreSQL Data Connector |
| Amazon Aurora MySQL | Connect to Aurora MySQL clusters using the MySQL connector. | MySQL Data Connector |
| Amazon RDS PostgreSQL | Connect to RDS PostgreSQL instances using the PostgreSQL connector. | PostgreSQL Data Connector |
| Amazon RDS MySQL | Connect to RDS MySQL instances using the MySQL connector. | MySQL Data Connector |
| Amazon MSK | Stream data from Amazon MSK (Managed Streaming for Apache Kafka) topics using the Kafka connector. | Kafka Data Connector |
| Debezium (Amazon MSK) | Change Data Capture (CDC) from databases via Debezium running on Amazon MSK for real-time dataset updates. | Debezium Data Connector |
| AWS Glue Data Catalog | Query Iceberg tables registered in AWS Glue. | Glue Data Connector |
| Apache Iceberg (AWS) | Query Iceberg tables stored in S3 with Glue or REST catalog metadata. | Iceberg Data Connector |
| Delta Lake (S3) | Query Delta Lake tables stored in Amazon S3. | Delta Lake Data Connector |
| AWS Athena (ODBC) | Connect to Athena using the ODBC connector with Athena SQL dialect support. | ODBC Data Connector |
Example: Amazon S3
datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
s3_region: us-east-1
s3_auth: iam_role # Uses IAM credentials from environment
Example: DynamoDB
datasets:
- from: dynamodb:users
name: users
params:
dynamodb_aws_region: us-west-2
Example: AWS Glue with Amazon S3 Tables
datasets:
- from: glue:my_namespace.orders
name: orders
params:
glue_catalog_id: 123635965758:s3tablescatalog/my-table-bucket
glue_region: us-east-2
Catalog Connectors
Catalog connectors provide schema discovery and unified access to tables in AWS data catalogs.
| Connector | Description | Documentation |
|---|---|---|
| AWS Glue Catalog | Discover and query tables from AWS Glue Data Catalog with glob pattern filtering. | Glue Catalog Connector |
Example: Glue Catalog
catalogs:
- from: glue
name: my_data_lake
include:
- '*.*' # Include all tables from all databases
params:
glue_region: us-east-1
AI Models (Amazon Bedrock)
Spice integrates with Amazon Bedrock for large language model inference, supporting Amazon Nova and other foundation models.
| Provider | Supported Models | Documentation |
|---|---|---|
| Amazon Bedrock | Amazon Nova (Micro, Lite, Pro, Premier), cross-region inference profiles | Bedrock Models |
Example: Amazon Nova
models:
- from: bedrock:us.amazon.nova-lite-v1:0
name: nova
params:
aws_region: us-east-1
Guardrails Support
Bedrock Guardrails can filter model inputs and outputs:
models:
- from: bedrock:amazon.nova-pro-v1:0
name: nova-guarded
params:
aws_region: us-east-1
bedrock_guardrail_identifier: arn:aws:bedrock:us-east-1:123456789012:guardrail/abc123
bedrock_guardrail_version: '1'
Embeddings (Amazon Bedrock)
Generate vector embeddings using Amazon Bedrock embedding models for semantic search and RAG applications.
| Provider | Supported Models | Documentation |
|---|---|---|
| Amazon Bedrock | Amazon Titan Embeddings, Amazon Nova Multimodal Embeddings, Cohere Embed | Bedrock Embeddings |
Example: Amazon Titan Embeddings
embeddings:
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan
params:
aws_region: us-east-1
dimensions: '256'
Example: Amazon Nova Multimodal Embeddings
embeddings:
- from: bedrock:amazon.nova-2-multimodal-embeddings-v1:0
name: nova_embed
params:
dimensions: '1024'
truncation_mode: START
embedding_purpose: GENERIC_RETRIEVAL
aws_region: us-east-1
Vector Stores (Amazon S3 Vectors)
Amazon S3 Vectors is a new S3 bucket type for storing and querying vector embeddings at scale. Spice integrates S3 Vectors as a vector index backend for hybrid search applications.
| Engine | Description | Documentation |
|---|---|---|
| Amazon S3 Vectors | Sub-second similarity queries on billions of vectors with up to 90% cost reduction compared to traditional vector databases. | S3 Vectors Engine |
Example: S3 Vectors with Bedrock Embeddings
datasets:
- from: oracle:"CUSTOMER_REVIEWS"
name: reviews
vectors:
enabled: true
engine: s3_vectors
params:
s3_vectors_bucket: my-s3-vector-bucket
s3_vectors_aws_region: us-east-1
columns:
- name: body
embeddings:
from: bedrock_titan
embeddings:
- from: bedrock:amazon.titan-embed-text-v2:0
name: bedrock_titan
params:
aws_region: us-east-1
dimensions: '256'
Secret Management
Securely store and retrieve credentials using AWS Secrets Manager.
| Store | Description | Documentation |
|---|---|---|
| AWS Secrets Manager | Read secrets from AWS Secrets Manager by secret name. | AWS Secrets Manager |
Example: Using Secrets Manager
secrets:
- from: aws_secrets_manager:my_database_creds
name: db
datasets:
- from: postgres:public.users
name: users
params:
pg_host: ${db:host}
pg_user: ${db:username}
pg_pass: ${db:password}
Authentication
All AWS integrations support the standard AWS SDK credential chain. When credentials are not explicitly configured, Spice loads them from the following sources in order:
- Environment Variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN - Shared Credentials Files:
~/.aws/credentialsand~/.aws/config - AWS SSO Sessions: Configured via
aws configure sso - Web Identity Token: For OIDC/OAuth (common with EKS IRSA)
- ECS Container Credentials: Automatic IAM role for ECS tasks
- EC2 Instance Metadata (IMDSv2): Automatic IAM role for EC2 instances
IAM Permissions
Ensure the IAM role or user has appropriate permissions for all AWS services used:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"dynamodb:Scan",
"dynamodb:DescribeTable",
"glue:GetTable",
"glue:GetTables",
"glue:GetDatabase",
"glue:GetDatabases",
"bedrock:InvokeModel",
"secretsmanager:GetSecretValue"
],
"Resource": "*"
}
]
}
Deployment Options
Deploy Spice on AWS infrastructure for optimal performance and integration:
| Option | Description | Documentation |
|---|---|---|
| Amazon EKS | Kubernetes orchestration with Helm chart deployment | AWS Deployment |
| Amazon ECS | Container service with Fargate or EC2 launch types | AWS Deployment |
| Amazon EC2 | Direct deployment with Docker or binary | AWS Deployment |
Resources
AWS Blog Posts
Spice.ai Blog Posts
- Amazon S3 Vectors - Overview of S3 Vectors integration
- Getting Started with Amazon S3 Vectors and Spice - Step-by-step tutorial
Videos
-
Getting started with Amazon S3 Vectors and Spice - YouTube walkthrough
-
How Spice AI operationalizes data lakes for AI using Amazon S3 - Spice presentation at re:Invent
Marketplace
- Spice.ai on AWS Marketplace - Deploy Spice.ai from AWS Marketplace
Quick Start
Get started with Spice on AWS in minutes:
- Install Spice CLI:
curl https://install.spiceai.org | /bin/bash
- Configure AWS credentials:
aws configure
- Create a Spicepod with S3 data:
# spicepod.yaml
version: v1beta1
kind: Spicepod
name: aws_quickstart
datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
s3_auth: iam_role
- Start the runtime:
spice run
- Query your data:
spice sql
> SELECT COUNT(*) FROM taxi_trips;
