Skip to main content

GraphQL Data Connector

The GraphQL Data Connector enables federated SQL queries on any GraphQL endpoint by specifying graphql as the selector in the from value for the dataset.

datasets:
- from: graphql:your-graphql-endpoint
name: my_dataset
params:
json_pointer: /data/some/nodes
graphql_query: |
{
some {
nodes {
field1
field2
}
}
}
Limitations
  • The GraphQL data connector does not support variables in the query.
  • Filter pushdown, with the exclusion of LIMIT, is not currently supported. Using a LIMIT will reduce the amount of data requested from the GraphQL server.

Configuration​

from​

The from field takes the form of graphql:your-graphql-endpoint.

name​

The dataset name. This will be used as the table name within Spice.

params​

The GraphQL data connector can be configured by providing the following params. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_graphql_auth_token}.

Parameter NameDescription
unnest_depthDepth level to automatically unnest objects to. By default, disabled if unspecified or 0.
graphql_auth_tokenThe authentication token to use to connect to the GraphQL server. Uses bearer authentication.
graphql_auth_userThe username to use for basic auth. E.g. graphql_auth_user: my_user
graphql_auth_passThe password to use for basic auth. E.g. graphql_auth_pass: ${secrets:my_graphql_auth_pass}
graphql_queryThe username to use for basic auth. See examples for a sample GraphQL query
json_pointerThe JSON pointer into the response body. When graphql_query is paginated, the json_pointer can be inferred.

GraphQL Query Example​

query: |
{
some {
nodes {
field1
field2
}
}
}

Examples​

Example using the GitHub GraphQL API and Bearer Auth. The following will use json_pointer to retrieve all of the nodes in starredRepositories:

from: graphql:https://api.github.com/graphql
name: stars
params:
graphql_auth_token: ${env:GITHUB_TOKEN}
graphql_auth_user: ${env:GRAPHQL_USER} ...
graphql_auth_pass: ${env:GRAPHQL_PASS}
json_pointer: /data/viewer/starredRepositories/nodes
graphql_query: |
{
viewer {
starredRepositories {
nodes {
name
stargazerCount
languages (first: 10) {
nodes {
name
}
}
}
}
}
}

Pagination​

The GraphQL Data Connector supports automatic pagination of the response for queries using cursor pagination.

The graphql_query must include the pageInfo field as per spec. The connector will parse the graphql_query, and when pageInfo is present, will retrieve data until pagination completes.

The query must have the correct pagination arguments in the associated paginated field.

Example​

Forward Pagination:

{
something_paginated(first: 100) {
nodes {
foo
bar
}
pageInfo {
endCursor
hasNextPage
}
}
}

Backward Pagination:

{
something_paginated(last: 100) {
nodes {
foo
bar
}
pageInfo {
startCursor
hasPreviousPage
}
}
}

Working with JSON Data​

Tips for working with JSON data. For more information see Datafusion Docs.

Accessing objects fields​

You can access the fields of the object using the square bracket notation. Arrays are indexed from 1.

Example for the stargazers query from pagination section:

sql> select node['login'] as login, node['name'] as name from stargazers limit 5;
+--------------+----------------------+
| login | name |
+--------------+----------------------+
| simsieg | Simon Siegert |
| davidmathers | David Mathers |
| ahmedtadde | Ahmed Tadde |
| lordhamlet | Shih-Fen Cheng |
| thinmy | Thinmy Patrick Alves |
+--------------+----------------------+

Piping array into rows​

You can use Datafusion unnest function to pipe values from array into rows. We'll be using countries GraphQL api as an example.

from: graphql:https://countries.trevorblades.com
name: countries
params:
json_pointer: /data/continents
graphql_query: |
{
continents {
name
countries {
name
capital
}
}
}

description: countries
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 30m

Example query:

sql> select continent, country['name'] as country, country['capital'] as capital
from (select name as continent, unnest(countries) as country from countries)
where continent = 'North America' limit 5;
+---------------+---------------------+--------------+
| continent | country | capital |
+---------------+---------------------+--------------+
| North America | Antigua and Barbuda | Saint John's |
| North America | Anguilla | The Valley |
| North America | Aruba | Oranjestad |
| North America | Barbados | Bridgetown |
| North America | Saint BarthΓ©lemy | Gustavia |
+---------------+---------------------+--------------+

Unnesting object properties​

You can also use the unnest_depth parameter to control automatic unnesting of objects from GraphQL responses.

This examples uses the GitHub stargazers endpoint:

from: graphql:https://api.github.com/graphql
name: stargazers
params:
graphql_auth_token: ${env:GITHUB_TOKEN}
unnest_depth: 2
json_pointer: /data/repository/stargazers/edges
graphql_query: |
{
repository(name: "spiceai", owner: "spiceai") {
id
name
stargazers(first: 100) {
edges {
node {
id
name
login
}
}
pageInfo {
hasNextPage
endCursor
}
}

}
}

If unnest_depth is set to 0, or unspecified, object unnesting is disabled. When enabled, unnesting automatically moves nested fields to the parent level.

Without unnesting, stargazers data looks like this in a query:

sql> select node from stargazers limit 1;
+------------------------------------------------------------+
| node |
+------------------------------------------------------------+
| {id: MDQ6VXNlcjcwNzIw, login: ashtom, name: Thomas Dohmke} |
+------------------------------------------------------------+

With unnesting, these properties are automatically placed into their own columns:

sql> select node from stargazers limit 1;
+------------------+--------+---------------+
| id | login | name |
+------------------+--------+---------------+
| MDQ6VXNlcjcwNzIw | ashtom | Thomas Dohmke |
+------------------+--------+---------------+

Unnesting Duplicate Columns​

By default, the Spice Runtime will error when a duplicate column is detected during unnesting.

For example, this example spicepod.yml query would fail due to name fields:

from: graphql:https://localhost
name: stargazers
params:
unnest_depth: 2
json_pointer: /data/users
graphql_query: |
query {
users {
name
emergency_contact {
name
}
}
}

This example would fail with a runtime error:

WARN runtime: GraphQL Data Connector Error: Invalid object access. Column 'name' already exists in the object.

Avoid this error by using aliases in the query where possible. In the example above, a duplicate error was introduced from emergency_contact { name }.

The example below uses a GraphQL alias to rename emergency_contact.name as emergencyContactName.

from: graphql:https://localhost
name: stargazers
params:
unnest_depth: 2
json_pointer: /data/people
graphql_query: |
query {
users {
name
emergency_contact {
emergencyContactName: name
}
}
}

Cookbook​