Spice is built on Apache DataFusion and uses the PostgreSQL dialect, even when querying datasources with different SQL dialects.
Subqueries
A subquery, also known as an inner query or nested query, is a query inside another query. Subqueries can appear in the SELECT
, FROM
, WHERE
, and HAVING
clauses. The examples below reference these sample tables:
The examples below are based on the following tables.
SELECT * FROM x;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
SELECT * FROM y;
+--------+--------+
| number | string |
+--------+--------+
| 1 | one |
+--------+--------+
| 2 | two |
+--------+--------+
| 3 | three |
+--------+--------+
| 4 | four |
+--------+--------+
Subquery operators​
[ NOT ] EXISTS​
The EXISTS
operator returns rows for which a correlated subquery produces one or more matches. The NOT EXISTS
operator returns rows for which the correlated subquery produces zero matches. Only correlated subquery are supported.
[NOT] EXISTS (subquery)
[ NOT ] IN​
The IN operator returns rows that match any value produced by a correlated subquery or listed values. The NOT IN operator returns rows that do not match any of these values.
expression [NOT] IN (subquery|list-literal)
Examples​
SELECT * FROM x WHERE column_1 IN (1,3);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
SELECT * FROM x WHERE column_1 NOT IN (1,3);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
SELECT clause subqueries​
SELECT
clause subqueries use values returned from the inner query as part of the outer query's SELECT
list.
The SELECT
clause only supports scalar subqueries that return a single value per execution of the inner query. The returned value can be unique per row.
SELECT [expression1[, expression2, ..., expressionN],] (<subquery>)
Note: SELECT
clause subqueries can be used as an alternative to JOIN
operations.
Example​
SELECT
column_1,
(
SELECT
first_value(string)
FROM
y
WHERE
number = x.column_1
) AS "numeric string"
FROM
x;
+----------+----------------+
| column_1 | numeric string |
+----------+----------------+
| 1 | one |
| 2 | two |
+----------+----------------+
FROM clause subqueries​
A subquery in the FROM
clause produces a result set that is then referenced by the outer query.
SELECT expression1[, expression2, ..., expressionN] FROM (<subquery>)
Example​
The following query returns the average of maximum values per room. The inner query returns the maximum value for each field from each room. The outer query uses the results of the inner query and returns the average maximum value for each field.
SELECT
column_2
FROM
(
SELECT
*
FROM
x
WHERE
column_1 > 1
);
+----------+
| column_2 |
+----------+
| 4 |
+----------+
WHERE clause subqueries​
A subquery in the WHERE
clause compares an expression to the subquery result, returning true or false. Rows that evaluate to false or NULL are filtered from the final result. Both correlated and non-correlated subqueries are supported in WHERE
clause subqueries, as well as scalar and non-scalar subqueries (depending on the operator).
SELECT
expression1[, expression2, ..., expressionN]
FROM
<measurement>
WHERE
expression operator (<subquery>)
Note: WHERE
clause subqueries can be used as an alternative to JOIN
operations.
Examples​
WHERE
clause with scalar subquery​
The following query returns all rows with column_2
values above the average
of all number
values in y
.
SELECT
*
FROM
x
WHERE
column_2 > (
SELECT
AVG(number)
FROM
y
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
WHERE
clause with non-scalar subquery​
Non-scalar subqueries must use the [NOT] IN
or [NOT] EXISTS
operators and
can only return a single column.
The values in the returned column are evaluated as a list.
The following query returns all rows with column_2
values in table x
that
are in the list of numbers with string lengths greater than three from table
y
.
SELECT
*
FROM
x
WHERE
column_2 IN (
SELECT
number
FROM
y
WHERE
length(string) > 3
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
WHERE
clause with correlated subquery​
The following query returns rows with column_2
values from table x
greater
than the average string
value length from table y
.
The subquery in the WHERE
clause uses the column_1
value from the outer
query to return the average string
value length for that specific value.
SELECT
*
FROM
x
WHERE
column_2 > (
SELECT
AVG(length(string))
FROM
y
WHERE
number = x.column_1
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
HAVING clause subqueries​
A subquery in the HAVING
clause compares an expression using aggregate functions to the subquery result and returns true or false. Rows that evaluate to false are excluded. Both correlated and non-correlated subqueries are possible, as well as scalar and non-scalar subqueries (depending on the operator).
SELECT
aggregate_expression1[, aggregate_expression2, ..., aggregate_expressionN]
FROM
<measurement>
WHERE
<conditional_expression>
GROUP BY
column_expression1[, column_expression2, ..., column_expressionN]
HAVING
expression operator (<subquery>)
Examples​
The following query calculates the averages of even and odd numbers in table y
and returns the averages that are equal to the maximum value of column_1
in table x
.
HAVING
clause with a scalar subquery​
SELECT
AVG(number) AS avg,
(number % 2 = 0) AS even
FROM
y
GROUP BY
even
HAVING
avg = (
SELECT
MAX(column_1)
FROM
x
);
+-------+--------+
| avg | even |
+-------+--------+
| 2 | false |
+-------+--------+
HAVING
clause with a non-scalar subquery​
Non-scalar subqueries must use the [NOT] IN
or [NOT] EXISTS
operators and
can only return a single column.
The values in the returned column are evaluated as a list.
The following query calculates the averages of even and odd numbers in table y
and returns the averages that are in column_1
of table x
.
SELECT
AVG(number) AS avg,
(number % 2 = 0) AS even
FROM
y
GROUP BY
even
HAVING
avg IN (
SELECT
column_1
FROM
x
);
+-------+--------+
| avg | even |
+-------+--------+
| 2 | false |
+-------+--------+
Subquery categories​
Subqueries can be categorized as one or more of the following based on the behavior of the subquery:
Correlated subqueries​
A correlated subquery depends on the values of the current row processed by the outer query. Spice uses DataFusion execution engine, which rewrites correlated subqueries into joins to improve performance. Correlated subqueries are typically less performant than non-correlated subqueries.
Non-correlated subqueries​
A non-correlated subquery does not depend on the outer query. The inner query runs first and passes its result to the outer query.
Scalar subqueries​
A scalar subquery returns exactly one value (one column of one row). If no rows match, the subquery returns NULL.
Non-scalar subqueries​
A non-scalar subquery can return 0, 1, or more rows, each potentially containing one or more columns. If no rows qualify, it returns zero rows. If there are no values for a particular column, it returns NULL for that column.