Skip to main content
Enterprise Support Available
Spice.ai Enterprise provides full support for up to 3 years from the release date.
Version: v1.7

GraphQL Data Connector

The GraphQL Data Connector enables federated SQL queries on any GraphQL endpoint by specifying graphql as the selector in the from value for the dataset.

datasets:
- from: graphql:your-graphql-endpoint
name: my_dataset
params:
json_pointer: /data/some/nodes
graphql_query: |
{
some {
nodes {
field1
field2
}
}
}
Limitations
  • The GraphQL data connector does not support variables in the query.
  • Filter pushdown, with the exclusion of LIMIT, is not currently supported. Using a LIMIT will reduce the amount of data requested from the GraphQL server.

Configuration​

from​

The from field takes the form of graphql:your-graphql-endpoint.

name​

The dataset name. This will be used as the table name within Spice. The dataset name cannot be a reserved keyword.

params​

The GraphQL data connector can be configured by providing the following params. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_graphql_auth_token}.

Parameter NameDescription
unnest_depthDepth level to automatically unnest objects to. By default, disabled if unspecified or 0.
graphql_auth_tokenThe authentication token to use to connect to the GraphQL server. Uses bearer authentication.
graphql_auth_userThe username to use for basic auth. E.g. graphql_auth_user: my_user
graphql_auth_passThe password to use for basic auth. E.g. graphql_auth_pass: ${secrets:my_graphql_auth_pass}
graphql_queryThe username to use for basic auth. See examples for a sample GraphQL query
json_pointerThe JSON pointer into the response body. When graphql_query is paginated, the json_pointer can be inferred.

GraphQL Query Example​

query: |
{
some {
nodes {
field1
field2
}
}
}

Examples​

Example using the GitHub GraphQL API and Bearer Auth. The following will use json_pointer to retrieve all of the nodes in starredRepositories:

from: graphql:https://api.github.com/graphql
name: stars
params:
graphql_auth_token: ${env:GITHUB_TOKEN}
graphql_auth_user: ${env:GRAPHQL_USER} ...
graphql_auth_pass: ${env:GRAPHQL_PASS}
json_pointer: /data/viewer/starredRepositories/nodes
graphql_query: |
{
viewer {
starredRepositories {
nodes {
name
stargazerCount
languages (first: 10) {
nodes {
name
}
}
}
}
}
}

Pagination​

The GraphQL Data Connector supports automatic pagination of the response for queries using cursor pagination.

The graphql_query must include the pageInfo field as per spec. The connector will parse the graphql_query, and when pageInfo is present, will retrieve data until pagination completes.

The query must have the correct pagination arguments in the associated paginated field.

Example​

Forward Pagination:

{
something_paginated(first: 100) {
nodes {
foo
bar
}
pageInfo {
endCursor
hasNextPage
}
}
}

Backward Pagination:

{
something_paginated(last: 100) {
nodes {
foo
bar
}
pageInfo {
startCursor
hasPreviousPage
}
}
}

Working with JSON Data​

Tips for working with JSON data. For more information see Datafusion Docs.

Accessing objects fields​

You can access the fields of the object using the square bracket notation. Arrays are indexed from 1.

Example for the stargazers query from pagination section:

sql> select node['login'] as login, node['name'] as name from stargazers limit 5;
+--------------+----------------------+
| login | name |
+--------------+----------------------+
| simsieg | Simon Siegert |
| davidmathers | David Mathers |
| ahmedtadde | Ahmed Tadde |
| lordhamlet | Shih-Fen Cheng |
| thinmy | Thinmy Patrick Alves |
+--------------+----------------------+

Piping array into rows​

You can use Datafusion unnest function to pipe values from array into rows. We'll be using countries GraphQL api as an example.

from: graphql:https://countries.trevorblades.com
name: countries
params:
json_pointer: /data/continents
graphql_query: |
{
continents {
name
countries {
name
capital
}
}
}

description: countries
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 30m

Example query:

sql> select continent, country['name'] as country, country['capital'] as capital
from (select name as continent, unnest(countries) as country from countries)
where continent = 'North America' limit 5;
+---------------+---------------------+--------------+
| continent | country | capital |
+---------------+---------------------+--------------+
| North America | Antigua and Barbuda | Saint John's |
| North America | Anguilla | The Valley |
| North America | Aruba | Oranjestad |
| North America | Barbados | Bridgetown |
| North America | Saint BarthΓ©lemy | Gustavia |
+---------------+---------------------+--------------+

Unnesting object properties​

You can also use the unnest_depth parameter to control automatic unnesting of objects from GraphQL responses.

This examples uses the GitHub stargazers endpoint:

from: graphql:https://api.github.com/graphql
name: stargazers
params:
graphql_auth_token: ${env:GITHUB_TOKEN}
unnest_depth: 2
json_pointer: /data/repository/stargazers/edges
graphql_query: |
{
repository(name: "spiceai", owner: "spiceai") {
id
name
stargazers(first: 100) {
edges {
node {
id
name
login
}
}
pageInfo {
hasNextPage
endCursor
}
}

}
}

If unnest_depth is set to 0, or unspecified, object unnesting is disabled. When enabled, unnesting automatically moves nested fields to the parent level.

Without unnesting, stargazers data looks like this in a query:

sql> select node from stargazers limit 1;
+------------------------------------------------------------+
| node |
+------------------------------------------------------------+
| {id: MDQ6VXNlcjcwNzIw, login: ashtom, name: Thomas Dohmke} |
+------------------------------------------------------------+

With unnesting, these properties are automatically placed into their own columns:

sql> select node from stargazers limit 1;
+------------------+--------+---------------+
| id | login | name |
+------------------+--------+---------------+
| MDQ6VXNlcjcwNzIw | ashtom | Thomas Dohmke |
+------------------+--------+---------------+

Unnesting Duplicate Columns​

By default, the Spice Runtime will error when a duplicate column is detected during unnesting.

For example, this example spicepod.yml query would fail due to name fields:

from: graphql:https://localhost
name: stargazers
params:
unnest_depth: 2
json_pointer: /data/users
graphql_query: |
query {
users {
name
emergency_contact {
name
}
}
}

This example would fail with a runtime error:

WARN runtime: GraphQL Data Connector Error: Invalid object access. Column 'name' already exists in the object.

Avoid this error by using aliases in the query where possible. In the example above, a duplicate error was introduced from emergency_contact { name }.

The example below uses a GraphQL alias to rename emergency_contact.name as emergencyContactName.

from: graphql:https://localhost
name: stargazers
params:
unnest_depth: 2
json_pointer: /data/people
graphql_query: |
query {
users {
name
emergency_contact {
emergencyContactName: name
}
}
}

Cookbook​