Skip to main content

AWS Integrations

Spice.ai and AWS

Spice.ai provides deep integrations with Amazon Web Services (AWS), enabling data federation, AI inference, vector search, and secure secret management across the AWS ecosystem. This page consolidates all AWS-compatible components and provides quick access to configuration guides.

Data Connectors

Data connectors federate SQL queries across AWS data sources without data movement.

ConnectorDescriptionDocumentation
Amazon S3Query Parquet, CSV, and JSON files stored in S3 buckets. Supports private buckets with IAM authentication and S3-compatible storage like MinIO.S3 Data Connector
Amazon S3 TablesQuery Iceberg tables in Amazon S3 Tables using the Glue connector with S3 Tables catalog format.Glue Data Connector
Amazon DynamoDBFederated SQL queries on DynamoDB tables with automatic schema inference.DynamoDB Data Connector
Amazon DynamoDB StreamsReal-time CDC streaming of table changes via DynamoDB Streams.DynamoDB Data Connector
Amazon RedshiftConnect to Redshift clusters using the PostgreSQL-compatible connector.Redshift Data Connector
Amazon Aurora PostgreSQLConnect to Aurora PostgreSQL clusters using the PostgreSQL connector.PostgreSQL Data Connector
Amazon Aurora MySQLConnect to Aurora MySQL clusters using the MySQL connector.MySQL Data Connector
Amazon RDS PostgreSQLConnect to RDS PostgreSQL instances using the PostgreSQL connector.PostgreSQL Data Connector
Amazon RDS MySQLConnect to RDS MySQL instances using the MySQL connector.MySQL Data Connector
Amazon MSKStream data from Amazon MSK (Managed Streaming for Apache Kafka) topics using the Kafka connector.Kafka Data Connector
Debezium (Amazon MSK)Change Data Capture (CDC) from databases via Debezium running on Amazon MSK for real-time dataset updates.Debezium Data Connector
AWS Glue Data CatalogQuery Iceberg tables registered in AWS Glue.Glue Data Connector
Apache Iceberg (AWS)Query Iceberg tables stored in S3 with Glue or REST catalog metadata.Iceberg Data Connector
Delta Lake (S3)Query Delta Lake tables stored in Amazon S3.Delta Lake Data Connector
AWS Athena (ODBC)Connect to Athena using the ODBC connector with Athena SQL dialect support.ODBC Data Connector

Example: Amazon S3

datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
s3_region: us-east-1
s3_auth: iam_role # Uses IAM credentials from environment

Example: DynamoDB

datasets:
- from: dynamodb:users
name: users
params:
dynamodb_aws_region: us-west-2

Example: AWS Glue with Amazon S3 Tables

datasets:
- from: glue:my_namespace.orders
name: orders
params:
glue_catalog_id: 123635965758:s3tablescatalog/my-table-bucket
glue_region: us-east-2

Catalog Connectors

Catalog connectors provide schema discovery and unified access to tables in AWS data catalogs.

ConnectorDescriptionDocumentation
AWS Glue CatalogDiscover and query tables from AWS Glue Data Catalog with glob pattern filtering.Glue Catalog Connector

Example: Glue Catalog

catalogs:
- from: glue
name: my_data_lake
include:
- '*.*' # Include all tables from all databases
params:
glue_region: us-east-1

AI Models (Amazon Bedrock)

Spice integrates with Amazon Bedrock for large language model inference, supporting Amazon Nova and other foundation models.

ProviderSupported ModelsDocumentation
Amazon BedrockAmazon Nova (Micro, Lite, Pro, Premier), cross-region inference profilesBedrock Models

Example: Amazon Nova

models:
- from: bedrock:us.amazon.nova-lite-v1:0
name: nova
params:
aws_region: us-east-1

Guardrails Support

Bedrock Guardrails can filter model inputs and outputs:

models:
- from: bedrock:amazon.nova-pro-v1:0
name: nova-guarded
params:
aws_region: us-east-1
bedrock_guardrail_identifier: arn:aws:bedrock:us-east-1:123456789012:guardrail/abc123
bedrock_guardrail_version: '1'

Embeddings (Amazon Bedrock)

Generate vector embeddings using Amazon Bedrock embedding models for semantic search and RAG applications.

ProviderSupported ModelsDocumentation
Amazon BedrockAmazon Titan Embeddings, Amazon Nova Multimodal Embeddings, Cohere EmbedBedrock Embeddings

Example: Amazon Titan Embeddings

embeddings:
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan
params:
aws_region: us-east-1
dimensions: '256'

Example: Amazon Nova Multimodal Embeddings

embeddings:
- from: bedrock:amazon.nova-2-multimodal-embeddings-v1:0
name: nova_embed
params:
dimensions: '1024'
truncation_mode: START
embedding_purpose: GENERIC_RETRIEVAL
aws_region: us-east-1

Vector Stores (Amazon S3 Vectors)

Amazon S3 Vectors is a new S3 bucket type for storing and querying vector embeddings at scale. Spice integrates S3 Vectors as a vector index backend for hybrid search applications.

EngineDescriptionDocumentation
Amazon S3 VectorsSub-second similarity queries on billions of vectors with up to 90% cost reduction compared to traditional vector databases.S3 Vectors Engine

Example: S3 Vectors with Bedrock Embeddings

datasets:
- from: oracle:"CUSTOMER_REVIEWS"
name: reviews
vectors:
enabled: true
engine: s3_vectors
params:
s3_vectors_bucket: my-s3-vector-bucket
s3_vectors_aws_region: us-east-1
columns:
- name: body
embeddings:
from: bedrock_titan

embeddings:
- from: bedrock:amazon.titan-embed-text-v2:0
name: bedrock_titan
params:
aws_region: us-east-1
dimensions: '256'

Secret Management

Securely store and retrieve credentials using AWS Secrets Manager.

StoreDescriptionDocumentation
AWS Secrets ManagerRead secrets from AWS Secrets Manager by secret name.AWS Secrets Manager

Example: Using Secrets Manager

secrets:
- from: aws_secrets_manager:my_database_creds
name: db

datasets:
- from: postgres:public.users
name: users
params:
pg_host: ${db:host}
pg_user: ${db:username}
pg_pass: ${db:password}

Authentication

All AWS integrations support the standard AWS SDK credential chain. When credentials are not explicitly configured, Spice loads them from the following sources in order:

  1. Environment Variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
  2. Shared Credentials Files: ~/.aws/credentials and ~/.aws/config
  3. AWS SSO Sessions: Configured via aws configure sso
  4. Web Identity Token: For OIDC/OAuth (common with EKS IRSA)
  5. ECS Container Credentials: Automatic IAM role for ECS tasks
  6. EC2 Instance Metadata (IMDSv2): Automatic IAM role for EC2 instances

IAM Permissions

Ensure the IAM role or user has appropriate permissions for all AWS services used:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"dynamodb:Scan",
"dynamodb:DescribeTable",
"glue:GetTable",
"glue:GetTables",
"glue:GetDatabase",
"glue:GetDatabases",
"bedrock:InvokeModel",
"secretsmanager:GetSecretValue"
],
"Resource": "*"
}
]
}

Deployment Options

Deploy Spice on AWS infrastructure for optimal performance and integration:

OptionDescriptionDocumentation
Amazon EKSKubernetes orchestration with Helm chart deploymentAWS Deployment
Amazon ECSContainer service with Fargate or EC2 launch typesAWS Deployment
Amazon EC2Direct deployment with Docker or binaryAWS Deployment

Resources

AWS Blog Posts

Spice.ai Blog Posts

Videos

Marketplace

Quick Start

Get started with Spice on AWS in minutes:

  1. Install Spice CLI:
curl https://install.spiceai.org | /bin/bash
  1. Configure AWS credentials:
aws configure
  1. Create a Spicepod with S3 data:
# spicepod.yaml
version: v1beta1
kind: Spicepod
name: aws_quickstart

datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
s3_auth: iam_role
  1. Start the runtime:
spice run
  1. Query your data:
spice sql
> SELECT COUNT(*) FROM taxi_trips;