Skip to main content
Version: Next

FTP/SFTP Data Connector

FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) are network protocols for transferring files between a client and server. FTP transmits data in plain text, while SFTP provides encrypted file transfer over SSH, making it the preferred choice for secure environments.

The FTP/SFTP Data Connector enables federated SQL query across supported file formats stored on FTP/SFTP servers.

Quickstart​

Connect to an SFTP server and query CSV files:

datasets:
- from: sftp://files.example.com/data/sales/
name: sales
params:
file_format: csv
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Query the data using SQL:

SELECT * FROM sales LIMIT 10;

FTP vs SFTP​

FeatureFTPSFTP
Default Port2122
EncryptionNone (plain text)SSH encryption
AuthenticationUsername/passwordUsername/password or SSH keys
Recommended UseInternal/trusted networksProduction and public networks
Security Recommendation

Use SFTP instead of FTP whenever possible. FTP transmits credentials and data in plain text, making it vulnerable to interception.

Configuration​

from​

Specifies the FTP or SFTP server and path to connect to.

Format: ftp://<host>/<path> or sftp://<host>/<path>

  • <host>: The server hostname or IP address
  • <path>: Path to a file or directory on the server

When pointing to a directory, Spice loads all files within that directory recursively.

Examples:

# Connect to a specific file
from: sftp://files.example.com/data/customers.parquet

# Connect to a directory (loads all files)
from: sftp://files.example.com/data/sales/

# FTP connection
from: ftp://ftp.example.com/exports/reports/

name​

The dataset name used as the table name in SQL queries. Cannot be a reserved keyword.

params​

FTP Parameters​

Parameter NameDescription
file_formatRequired when connecting to a directory. See File Formats.
ftp_userUsername for FTP authentication.
ftp_passPassword for FTP authentication. Use secrets syntax: ${secrets:my_ftp_pass}.
ftp_portFTP server port. Default: 21.
client_timeoutConnection timeout duration. E.g. 30s, 1m. No timeout when unset.
hive_partitioning_enabledEnable Hive-style partitioning from folder structure. Default: false.

SFTP Parameters​

Parameter NameDescription
file_formatRequired when connecting to a directory. See File Formats.
sftp_userUsername for SFTP authentication.
sftp_passPassword for SFTP authentication. Use secrets syntax: ${secrets:my_sftp_pass}.
sftp_portSFTP server port. Default: 22.
client_timeoutConnection timeout duration. E.g. 30s, 1m. No timeout when unset.
hive_partitioning_enabledEnable Hive-style partitioning from folder structure. Default: false.

Examples​

Basic SFTP Connection​

Connect to an SFTP server with username and password authentication:

datasets:
- from: sftp://sftp.example.com/data/transactions/
name: transactions
params:
file_format: parquet
sftp_user: datauser
sftp_pass: ${secrets:sftp_password}

Basic FTP Connection​

Connect to an FTP server for internal file access:

datasets:
- from: ftp://ftp.internal.local/exports/daily/
name: daily_exports
params:
file_format: csv
ftp_user: ftpuser
ftp_pass: ${secrets:ftp_password}

Reading a Single File​

When pointing to a specific file, the format is inferred from the file extension:

datasets:
- from: sftp://files.example.com/reports/quarterly_summary.parquet
name: quarterly_summary
params:
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Connection with Timeout​

Configure a timeout for slow or unreliable connections:

datasets:
- from: sftp://remote-server.example.com/large-datasets/
name: large_dataset
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
client_timeout: 120s

Custom Port Configuration​

Connect to servers running on non-standard ports:

datasets:
- from: sftp://secure.example.com/data/
name: secure_data
params:
file_format: parquet
sftp_port: 2222
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Hive Partitioning​

Enable Hive-style partitioning to automatically extract partition columns from the folder structure:

datasets:
- from: sftp://datalake.example.com/events/
name: events
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
hive_partitioning_enabled: true

Given a folder structure like:

/events/
year=2024/
month=01/
data.parquet
month=02/
data.parquet
year=2025/
month=01/
data.parquet

Queries can filter on partition columns:

SELECT * FROM events WHERE year = '2024' AND month = '01';

Multiple Datasets from One Server​

Load different datasets from the same SFTP server:

datasets:
- from: sftp://data.example.com/sales/
name: sales
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

- from: sftp://data.example.com/inventory/
name: inventory
params:
file_format: csv
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}

Accelerated Dataset​

Enable local acceleration for faster repeated queries:

datasets:
- from: sftp://archive.example.com/historical/
name: historical_data
params:
file_format: parquet
sftp_user: ${secrets:sftp_user}
sftp_pass: ${secrets:sftp_pass}
acceleration:
enabled: true
refresh_check_interval: 1h

Secrets​

Spice integrates with multiple secret stores for secure credential management. Store FTP/SFTP credentials in a secret store and reference them using the ${secrets:key} syntax.

datasets:
- from: sftp://files.example.com/data/
name: secure_data
params:
file_format: parquet
sftp_user: ${secrets:sftp_username}
sftp_pass: ${secrets:sftp_password}

For detailed information, refer to the secret stores documentation.

Troubleshooting​

Connection Timeouts​

If connections frequently timeout, increase the client_timeout value:

params:
client_timeout: 120s

Authentication Failures​

Verify credentials are correctly stored in your secret store and that the user has read access to the specified path on the server.

File Format Errors​

When connecting to a directory, ensure file_format is specified and matches the actual file types in the directory. Spice expects all files in a directory to have the same format.

Cookbook​

Refer to the FTP cookbook recipe for a complete working example.