Skip to content

Add support for PIVOT#29404

Open
martint wants to merge 3 commits into
trinodb:masterfrom
martint:pivot
Open

Add support for PIVOT#29404
martint wants to merge 3 commits into
trinodb:masterfrom
martint:pivot

Conversation

@martint
Copy link
Copy Markdown
Member

@martint martint commented May 10, 2026

Summary

Adds first-class PIVOT syntax to Trino, eliminating the need to hand-write FILTER aggregation SQL to rotate row values into columns.

SELECT *
FROM sales PIVOT (
    sum(amount) AS total
    FOR month IN (1 AS jan, 2 AS feb, 3 AS mar)
    GROUP BY region
)

PIVOT attaches to the relation tree at the same level as MATCH_RECOGNIZE and supports:

  • multiple aggregations per slot, requiring an alias on each;
  • single or multi-column pivot keys, with tuple values for the multi-column form;
  • arbitrary aggregating expressions in the aggregation slot (e.g. sum(x) - sum(y)), validated by the existing AggregationAnalyzer;
  • explicit GROUP BY including GROUPING SETS, CUBE, ROLLUP, and GROUP BY ();
  • alias and column-alias on the PIVOT output (PIVOT (...) AS p (c1, c2)).

Two semantic choices align PIVOT with the rest of Trino rather than with other engines' shortcuts:

  • Missing GROUP BY implies GROUP BY (): the result is a single row whose only columns are the pivot output columns. To preserve a dimension, name it explicitly in GROUP BY.
  • Standard NULL semantics: IN (NULL) produces a column whose = predicate is UNKNOWN — never matches — so the column always carries the empty-input aggregation result.

PIVOT is added as a non-reserved keyword.

Architecture

  • The analyzer reads the user-written Pivot AST, calling analyzeExpression on the pivot columns, IN values, GROUP BY expressions, and slot expressions, and runs AggregationAnalyzer.verifySourceAggregations on the slot expressions. It records Analysis.PivotAnalysis: the GroupingSetAnalysis, the GROUP BY DISTINCT flag, the per-output-column descriptors (name + type), and the list of aggregate FunctionCalls (collected via the existing ExpressionTreeUtils.extractAggregateFunctions). It does not synthesize any AST.
  • The planner builds the desugared plan in QueryPlanner.planPivot. It projects aggregation inputs (aggregate args, complex grouping expressions, pivot column references, pivot value expressions), coerces, then constructs each value group's FILTER predicate as IR (Comparison + Logical, with a Cast when value/column types differ) and projects it as a boolean symbol. After planGroupingSets, it builds one AggregationNode.Aggregation per (value group, aggregate call) pair with filter set to the matching predicate symbol, and a single AggregationNode containing all of them. The output projection translates each user-written slot expression through a per-group TranslationMap overlay so the same sum(x) resolves to N different output Symbols for N value groups.

PIVOT reuses every existing aggregation, GROUP BY, FILTER, and coercion plan-time rule without introducing a new operator.

The first two commits are pure refactors that decouple analyzeGroupBy and planGroupingSets from QuerySpecification so PIVOT can reuse them. Both have no behavior change for the QuerySpec path.

Test plan

  • New TestPivot (19 cases): no/with GROUP BY, multi-aggregation, multi-column keys, expression aggregations, expression IN values, GROUPING SETS / CUBE / ROLLUP / GROUP BY (), NULL semantics, pivot-of-pivot, relation alias with column aliases, plus error paths (arity mismatch, missing alias, duplicate output column, non-aggregating slot).
  • New TestSqlParser cases (8) covering each grammar shape including pivot used as an identifier.
  • TestSqlKeywords and TestSqlParserErrorHandling updated to include the new PIVOT keyword.
  • Existing TestAnalyzer, TestSqlParser, MATCH_RECOGNIZE and aggregation tests continue to pass.
  • mvnd verify for the touched modules passes (compile, checkstyle, modernizer, error-prone).

Documentation

  • New docs/src/main/sphinx/sql/pivot.md reference page modelled after match-recognize.md.
  • Toctree entry in sql.md, cross-reference from select.md, and listing in language/sql-support.md.

martint added 3 commits May 12, 2026 21:29
Users currently hand-write FILTER-aggregation SQL to rotate row values
into columns. The result is verbose and error-prone, especially for
multi-aggregation or multi-key forms. PIVOT is supported by every other
major SQL engine and reduces this friction substantially.

PIVOT attaches to the relation tree at the same level as MATCH_RECOGNIZE:
it consumes a relation, transforms it by rotating distinct values of one
or more pivot columns into output columns, and accepts an optional
output alias. Multiple aggregations, multi-column pivot keys, and an
explicit GROUP BY (including GROUPING SETS, CUBE, ROLLUP, and GROUP BY
()) are all supported. Aggregation slots accept any expression
containing aggregates; the existing AggregationAnalyzer enforces
validity. Output columns are named by concatenating the value-alias and
aggregation-alias, so the multi-aggregation form requires aliases on
each aggregation.

Two semantic choices align PIVOT with the rest of Trino rather than with
other engines' PIVOT-specific shortcuts:

- When GROUP BY is omitted, no implicit grouping is performed. The query
  collapses to a single row unless the user adds dimensions explicitly,
  matching Trino's existing rule for aggregating queries.
- NULL values in the IN clause follow standard `=` semantics: they
  never match, so the corresponding output column is the empty-input
  aggregation result.

The analyzer does not synthesize any AST: it only reads the user-written
Pivot tree, calling analyzeExpression on the pivot columns, IN values,
GROUP BY expressions, and slot expressions, and runs
AggregationAnalyzer.verifySourceAggregations on the slot expressions to
enforce that non-aggregate sub-expressions reference only grouping
columns. Pivot-specific metadata is stored in Analysis.PivotAnalysis:
the GroupingSetAnalysis, the GROUP BY DISTINCT flag, and one
PivotOutputColumn (name + type) per output column.

The planner converts Pivot into plan nodes. It projects aggregation
inputs (aggregate arguments, complex grouping expressions, pivot column
references, pivot value expressions), coerces, then constructs a
boolean FILTER-predicate symbol per value group as IR (Comparison and
Logical, with a Cast on the value when its type differs from the pivot
column's type). After planGroupingSets and a single AggregationNode
that allocates one Symbol per (value group, aggregate function call),
the planner projects each output column by translating the user's slot
expression through a per-group TranslationMap that maps each aggregate
FunctionCall to its group-specific aggregation Symbol. PIVOT therefore
reuses every existing aggregation, GROUP BY, FILTER, and coercion
plan-time rule without introducing a new operator and without ever
synthesizing AST.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant