Add support for PIVOT by martint · Pull Request #29404 · trinodb/trino

martint · 2026-05-10T20:16:22Z

Summary

Adds first-class PIVOT syntax to Trino, eliminating the need to hand-write FILTER aggregation SQL to rotate row values into columns.

SELECT *
FROM sales PIVOT (
    sum(amount) AS total
    FOR month IN (1 AS jan, 2 AS feb, 3 AS mar)
    GROUP BY region
)

PIVOT attaches to the relation tree at the same level as MATCH_RECOGNIZE and supports:

multiple aggregations per slot, requiring an alias on each;
single or multi-column pivot keys, with tuple values for the multi-column form;
arbitrary aggregating expressions in the aggregation slot (e.g. sum(x) - sum(y)), validated by the existing AggregationAnalyzer;
explicit GROUP BY including GROUPING SETS, CUBE, ROLLUP, and GROUP BY ();
alias and column-alias on the PIVOT output (PIVOT (...) AS p (c1, c2)).

Two semantic choices align PIVOT with the rest of Trino rather than with other engines' shortcuts:

Missing GROUP BY implies GROUP BY (): the result is a single row whose only columns are the pivot output columns. To preserve a dimension, name it explicitly in GROUP BY.
Standard NULL semantics: IN (NULL) produces a column whose = predicate is UNKNOWN — never matches — so the column always carries the empty-input aggregation result.

PIVOT is added as a non-reserved keyword.

Architecture

The analyzer reads the user-written Pivot AST, calling analyzeExpression on the pivot columns, IN values, GROUP BY expressions, and slot expressions, and runs AggregationAnalyzer.verifySourceAggregations on the slot expressions. It records Analysis.PivotAnalysis: the GroupingSetAnalysis, the GROUP BY DISTINCT flag, the per-output-column descriptors (name + type), and the list of aggregate FunctionCalls (collected via the existing ExpressionTreeUtils.extractAggregateFunctions). It does not synthesize any AST.
The planner builds the desugared plan in QueryPlanner.planPivot. It projects aggregation inputs (aggregate args, complex grouping expressions, pivot column references, pivot value expressions), coerces, then constructs each value group's FILTER predicate as IR (Comparison + Logical, with a Cast when value/column types differ) and projects it as a boolean symbol. After planGroupingSets, it builds one AggregationNode.Aggregation per (value group, aggregate call) pair with filter set to the matching predicate symbol, and a single AggregationNode containing all of them. The output projection translates each user-written slot expression through a per-group TranslationMap overlay so the same sum(x) resolves to N different output Symbols for N value groups.

PIVOT reuses every existing aggregation, GROUP BY, FILTER, and coercion plan-time rule without introducing a new operator.

The first two commits are pure refactors that decouple analyzeGroupBy and planGroupingSets from QuerySpecification so PIVOT can reuse them. Both have no behavior change for the QuerySpec path.

Test plan

New TestPivot (19 cases): no/with GROUP BY, multi-aggregation, multi-column keys, expression aggregations, expression IN values, GROUPING SETS / CUBE / ROLLUP / GROUP BY (), NULL semantics, pivot-of-pivot, relation alias with column aliases, plus error paths (arity mismatch, missing alias, duplicate output column, non-aggregating slot).
New TestSqlParser cases (8) covering each grammar shape including pivot used as an identifier.
TestSqlKeywords and TestSqlParserErrorHandling updated to include the new PIVOT keyword.
Existing TestAnalyzer, TestSqlParser, MATCH_RECOGNIZE and aggregation tests continue to pass.
mvnd verify for the touched modules passes (compile, checkstyle, modernizer, error-prone).

Documentation

New docs/src/main/sphinx/sql/pivot.md reference page modelled after match-recognize.md.
Toctree entry in sql.md, cross-reference from select.md, and listing in language/sql-support.md.

Users currently hand-write FILTER-aggregation SQL to rotate row values into columns. The result is verbose and error-prone, especially for multi-aggregation or multi-key forms. PIVOT is supported by every other major SQL engine and reduces this friction substantially. PIVOT attaches to the relation tree at the same level as MATCH_RECOGNIZE: it consumes a relation, transforms it by rotating distinct values of one or more pivot columns into output columns, and accepts an optional output alias. Multiple aggregations, multi-column pivot keys, and an explicit GROUP BY (including GROUPING SETS, CUBE, ROLLUP, and GROUP BY ()) are all supported. Aggregation slots accept any expression containing aggregates; the existing AggregationAnalyzer enforces validity. Output columns are named by concatenating the value-alias and aggregation-alias, so the multi-aggregation form requires aliases on each aggregation. Two semantic choices align PIVOT with the rest of Trino rather than with other engines' PIVOT-specific shortcuts: - When GROUP BY is omitted, no implicit grouping is performed. The query collapses to a single row unless the user adds dimensions explicitly, matching Trino's existing rule for aggregating queries. - NULL values in the IN clause follow standard `=` semantics: they never match, so the corresponding output column is the empty-input aggregation result. The analyzer does not synthesize any AST: it only reads the user-written Pivot tree, calling analyzeExpression on the pivot columns, IN values, GROUP BY expressions, and slot expressions, and runs AggregationAnalyzer.verifySourceAggregations on the slot expressions to enforce that non-aggregate sub-expressions reference only grouping columns. Pivot-specific metadata is stored in Analysis.PivotAnalysis: the GroupingSetAnalysis, the GROUP BY DISTINCT flag, and one PivotOutputColumn (name + type) per output column. The planner converts Pivot into plan nodes. It projects aggregation inputs (aggregate arguments, complex grouping expressions, pivot column references, pivot value expressions), coerces, then constructs a boolean FILTER-predicate symbol per value group as IR (Comparison and Logical, with a Cast on the value when its type differs from the pivot column's type). After planGroupingSets and a single AggregationNode that allocates one Symbol per (value group, aggregate function call), the planner projects each output column by translating the user's slot expression through a per-group TranslationMap that maps each aggregate FunctionCall to its group-specific aggregation Symbol. PIVOT therefore reuses every existing aggregation, GROUP BY, FILTER, and coercion plan-time rule without introducing a new operator and without ever synthesizing AST.

github-actions Bot added docs cla-signed labels May 10, 2026

martint requested review from Praveen2112 and kasiafi May 11, 2026 17:36

martint added 3 commits May 12, 2026 21:29

Decouple analyzeGroupBy from QuerySpecification

1dac32b

Decouple planGroupingSets from QuerySpecification

c66a19f

martint force-pushed the pivot branch from 88aa68f to 74dff17 Compare May 12, 2026 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PIVOT#29404

Add support for PIVOT#29404
martint wants to merge 3 commits into
trinodb:masterfrom
martint:pivot

martint commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

martint commented May 10, 2026

Summary

Architecture

Test plan

Documentation

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant