FTP/SFTP Data Connector
FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) are network protocols for transferring files between a client and server. FTP transmits data in plain text, while SFTP provides encrypted file transfer over SSH, making it the preferred choice for secure environments.
The FTP/SFTP Data Connector enables federated SQL query across supported file formats stored on FTP/SFTP servers.
Quickstart​
Connect to an SFTP server and query CSV files:
datasets:
- from: sftp://files.example.com/data/sales/
name: sales
params:
file_format: csv
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
Query the data using SQL:
SELECT * FROM sales LIMIT 10;
FTP vs SFTP​
| Feature | FTP | SFTP |
|---|---|---|
| Default Port | 21 | 22 |
| Encryption | None (plain text) | SSH encryption |
| Authentication | Username/password | Username/password or SSH keys |
| Recommended Use | Internal/trusted networks | Production and public networks |
Use SFTP instead of FTP whenever possible. FTP transmits credentials and data in plain text, making it vulnerable to interception.
Configuration​
from​
Specifies the FTP or SFTP server and path to connect to.
Format: ftp://<host>/<path> or sftp://<host>/<path>
<host>: The server hostname or IP address<path>: Path to a file or directory on the server
When pointing to a directory, Spice loads all files within that directory recursively.
Examples:
# Connect to a specific file
from: sftp://files.example.com/data/customers.parquet
# Connect to a directory (loads all files)
from: sftp://files.example.com/data/sales/
# FTP connection
from: ftp://ftp.example.com/exports/reports/
name​
The dataset name used as the table name in SQL queries. Cannot be a reserved keyword.
params​
FTP Parameters​
| Parameter Name | Description |
|---|---|
file_format | Required when connecting to a directory. See File Formats. |
ftp_user | Username for FTP authentication. |
ftp_pass | Password for FTP authentication. Use secrets syntax: ${secrets:my_ftp_pass}. |
ftp_port | FTP server port. Default: 21. |
client_timeout | Connection timeout duration. E.g. 30s, 1m. No timeout when unset. |
hive_partitioning_enabled | Enable Hive-style partitioning from folder structure. Default: false. |
SFTP Parameters​
| Parameter Name | Description |
|---|---|
file_format | Required when connecting to a directory. See File Formats. |
sftp_user | Username for SFTP authentication. |
sftp_pass | Password for SFTP authentication. Use secrets syntax: ${secrets:my_sftp_pass}. |
sftp_port | SFTP server port. Default: 22. |
client_timeout | Connection timeout duration. E.g. 30s, 1m. No timeout when unset. |
hive_partitioning_enabled | Enable Hive-style partitioning from folder structure. Default: false. |
Examples​
Basic SFTP Connection​
Connect to an SFTP server with username and password authentication:
datasets:
- from: sftp://sftp.example.com/data/transactions/
name: transactions
params:
file_format: parquet
sftp_user: datauser
sftp_pass: ${secrets:sftp_password}
Basic FTP Connection​
Connect to an FTP server for internal file access:
datasets:
- from: ftp://ftp.internal.local/exports/daily/
name: daily_exports
params:
file_format: csv
ftp_user: ftpuser
ftp_pass: ${secrets:ftp_password}
Reading a Single File​
When pointing to a specific file, the format is inferred from the file extension:
datasets:
- from: sftp://files.example.com/reports/quarterly_summary.parquet
name: quarterly_summary
params:
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
Connection with Timeout​
Configure a timeout for slow or unreliable connections:
datasets:
- from: sftp://remote-server.example.com/large-datasets/
name: large_dataset
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
client_timeout: 120s
Custom Port Configuration​
Connect to servers running on non-standard ports:
datasets:
- from: sftp://secure.example.com/data/
name: secure_data
params:
file_format: parquet
sftp_port: 2222
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
Hive Partitioning​
Enable Hive-style partitioning to automatically extract partition columns from the folder structure:
datasets:
- from: sftp://datalake.example.com/events/
name: events
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
hive_partitioning_enabled: true
Given a folder structure like:
/events/
year=2024/
month=01/
data.parquet
month=02/
data.parquet
year=2025/
month=01/
data.parquet
Queries can filter on partition columns:
SELECT * FROM events WHERE year = '2024' AND month = '01';
Multiple Datasets from One Server​
Load different datasets from the same SFTP server:
datasets:
- from: sftp://data.example.com/sales/
name: sales
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
- from: sftp://data.example.com/inventory/
name: inventory
params:
file_format: csv
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
Accelerated Dataset​
Enable local acceleration for faster repeated queries:
datasets:
- from: sftp://archive.example.com/historical/
name: historical_data
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
acceleration:
enabled: true
refresh_check_interval: 1h
Secrets​
Spice integrates with multiple secret stores for secure credential management. Store FTP/SFTP credentials in a secret store and reference them using the ${secrets:key} syntax.
datasets:
- from: sftp://files.example.com/data/
name: secure_data
params:
file_format: parquet
sftp_user: ${secrets:sftp_username}
sftp_pass: ${secrets:sftp_password}
For detailed information, refer to the secret stores documentation.
Troubleshooting​
Connection Timeouts​
If connections frequently timeout, increase the client_timeout value:
params:
client_timeout: 120s
Authentication Failures​
Verify credentials are correctly stored in your secret store and that the user has read access to the specified path on the server.
File Format Errors​
When connecting to a directory, ensure file_format is specified and matches the actual file types in the directory. Spice expects all files in a directory to have the same format.
Cookbook​
Refer to the FTP cookbook recipe for a complete working example.
