Skip to main content
Version: Next (v1.11)

FTP/SFTP Data Connector

FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) are network protocols for transferring files between a client and server. FTP transmits data in plain text, while SFTP provides encrypted file transfer over SSH, making it the preferred choice for secure environments.

The FTP/SFTP Data Connector enables federated SQL query across supported file formats stored on FTP/SFTP servers.

Quickstart

Connect to an SFTP server and query CSV files:

datasets:
- from: sftp://files.example.com/data/sales/
name: sales
params:
file_format: csv
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Query the data using SQL:

SELECT * FROM sales LIMIT 10;

FTP vs SFTP

FeatureFTPSFTP
Default Port2122
EncryptionNone (plain text)SSH encryption
AuthenticationUsername/passwordUsername/password or SSH keys
Recommended UseInternal/trusted networksProduction and public networks
Security Recommendation

Use SFTP instead of FTP whenever possible. FTP transmits credentials and data in plain text, making it vulnerable to interception.

Configuration

from

Specifies the FTP or SFTP server and path to connect to.

Format: ftp://<host>/<path> or sftp://<host>/<path>

  • <host>: The server hostname or IP address
  • <path>: Path to a file or directory on the server

When pointing to a directory, Spice loads all files within that directory recursively.

Examples:

# Connect to a specific file
from: sftp://files.example.com/data/customers.parquet

# Connect to a directory (loads all files)
from: sftp://files.example.com/data/sales/

# FTP connection
from: ftp://ftp.example.com/exports/reports/

name

The dataset name used as the table name in SQL queries. Cannot be a reserved keyword.

params

FTP Parameters

Parameter NameDescription
file_formatRequired when connecting to a directory. See File Formats.
ftp_userUsername for FTP authentication.
ftp_passPassword for FTP authentication. Use secrets syntax: ${secrets:my_ftp_pass}.
ftp_portFTP server port. Default: 21.
client_timeoutConnection timeout duration. E.g. 30s, 1m. No timeout when unset.
hive_partitioning_enabledEnable Hive-style partitioning from folder structure. Default: false.

SFTP Parameters

Parameter NameDescription
file_formatRequired when connecting to a directory. See File Formats.
sftp_userUsername for SFTP authentication.
sftp_passPassword for SFTP authentication. Use secrets syntax: ${secrets:my_sftp_pass}.
sftp_portSFTP server port. Default: 22.
client_timeoutConnection timeout duration. E.g. 30s, 1m. No timeout when unset.
hive_partitioning_enabledEnable Hive-style partitioning from folder structure. Default: false.

Examples

Basic SFTP Connection

Connect to an SFTP server with username and password authentication:

datasets:
- from: sftp://sftp.example.com/data/transactions/
name: transactions
params:
file_format: parquet
sftp_user: datauser
sftp_pass: ${secrets:sftp_password}

Basic FTP Connection

Connect to an FTP server for internal file access:

datasets:
- from: ftp://ftp.internal.local/exports/daily/
name: daily_exports
params:
file_format: csv
ftp_user: ftpuser
ftp_pass: ${secrets:ftp_password}

Reading a Single File

When pointing to a specific file, the format is inferred from the file extension:

datasets:
- from: sftp://files.example.com/reports/quarterly_summary.parquet
name: quarterly_summary
params:
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Connection with Timeout

Configure a timeout for slow or unreliable connections:

datasets:
- from: sftp://remote-server.example.com/large-datasets/
name: large_dataset
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
client_timeout: 120s

Custom Port Configuration

Connect to servers running on non-standard ports:

datasets:
- from: sftp://secure.example.com/data/
name: secure_data
params:
file_format: parquet
sftp_port: 2222
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Hive Partitioning

Enable Hive-style partitioning to automatically extract partition columns from the folder structure:

datasets:
- from: sftp://datalake.example.com/events/
name: events
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
hive_partitioning_enabled: true

Given a folder structure like:

/events/
year=2024/
month=01/
data.parquet
month=02/
data.parquet
year=2025/
month=01/
data.parquet

Queries can filter on partition columns:

SELECT * FROM events WHERE year = '2024' AND month = '01';

Multiple Datasets from One Server

Load different datasets from the same SFTP server:

datasets:
- from: sftp://data.example.com/sales/
name: sales
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

- from: sftp://data.example.com/inventory/
name: inventory
params:
file_format: csv
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Accelerated Dataset

Enable local acceleration for faster repeated queries:

datasets:
- from: sftp://archive.example.com/historical/
name: historical_data
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
acceleration:
enabled: true
refresh_check_interval: 1h

Secrets

Spice integrates with multiple secret stores for secure credential management. Store FTP/SFTP credentials in a secret store and reference them using the ${secrets:key} syntax.

datasets:
- from: sftp://files.example.com/data/
name: secure_data
params:
file_format: parquet
sftp_user: ${secrets:sftp_username}
sftp_pass: ${secrets:sftp_password}

For detailed information, refer to the secret stores documentation.

Troubleshooting

Connection Timeouts

If connections frequently timeout, increase the client_timeout value:

params:
client_timeout: 120s

Authentication Failures

Verify credentials are correctly stored in your secret store and that the user has read access to the specified path on the server.

File Format Errors

When connecting to a directory, ensure file_format is specified and matches the actual file types in the directory. Spice expects all files in a directory to have the same format.

Cookbook

Refer to the FTP cookbook recipe for a complete working example.