Add support for PIVOT#29404
Open
martint wants to merge 3 commits into
Open
Conversation
Users currently hand-write FILTER-aggregation SQL to rotate row values into columns. The result is verbose and error-prone, especially for multi-aggregation or multi-key forms. PIVOT is supported by every other major SQL engine and reduces this friction substantially. PIVOT attaches to the relation tree at the same level as MATCH_RECOGNIZE: it consumes a relation, transforms it by rotating distinct values of one or more pivot columns into output columns, and accepts an optional output alias. Multiple aggregations, multi-column pivot keys, and an explicit GROUP BY (including GROUPING SETS, CUBE, ROLLUP, and GROUP BY ()) are all supported. Aggregation slots accept any expression containing aggregates; the existing AggregationAnalyzer enforces validity. Output columns are named by concatenating the value-alias and aggregation-alias, so the multi-aggregation form requires aliases on each aggregation. Two semantic choices align PIVOT with the rest of Trino rather than with other engines' PIVOT-specific shortcuts: - When GROUP BY is omitted, no implicit grouping is performed. The query collapses to a single row unless the user adds dimensions explicitly, matching Trino's existing rule for aggregating queries. - NULL values in the IN clause follow standard `=` semantics: they never match, so the corresponding output column is the empty-input aggregation result. The analyzer does not synthesize any AST: it only reads the user-written Pivot tree, calling analyzeExpression on the pivot columns, IN values, GROUP BY expressions, and slot expressions, and runs AggregationAnalyzer.verifySourceAggregations on the slot expressions to enforce that non-aggregate sub-expressions reference only grouping columns. Pivot-specific metadata is stored in Analysis.PivotAnalysis: the GroupingSetAnalysis, the GROUP BY DISTINCT flag, and one PivotOutputColumn (name + type) per output column. The planner converts Pivot into plan nodes. It projects aggregation inputs (aggregate arguments, complex grouping expressions, pivot column references, pivot value expressions), coerces, then constructs a boolean FILTER-predicate symbol per value group as IR (Comparison and Logical, with a Cast on the value when its type differs from the pivot column's type). After planGroupingSets and a single AggregationNode that allocates one Symbol per (value group, aggregate function call), the planner projects each output column by translating the user's slot expression through a per-group TranslationMap that maps each aggregate FunctionCall to its group-specific aggregation Symbol. PIVOT therefore reuses every existing aggregation, GROUP BY, FILTER, and coercion plan-time rule without introducing a new operator and without ever synthesizing AST.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds first-class
PIVOTsyntax to Trino, eliminating the need to hand-writeFILTERaggregation SQL to rotate row values into columns.PIVOTattaches to the relation tree at the same level asMATCH_RECOGNIZEand supports:sum(x) - sum(y)), validated by the existingAggregationAnalyzer;GROUP BYincludingGROUPING SETS,CUBE,ROLLUP, andGROUP BY ();PIVOToutput (PIVOT (...) AS p (c1, c2)).Two semantic choices align
PIVOTwith the rest of Trino rather than with other engines' shortcuts:GROUP BY (): the result is a single row whose only columns are the pivot output columns. To preserve a dimension, name it explicitly inGROUP BY.IN (NULL)produces a column whose=predicate isUNKNOWN— never matches — so the column always carries the empty-input aggregation result.PIVOTis added as a non-reserved keyword.Architecture
PivotAST, callinganalyzeExpressionon the pivot columns,INvalues,GROUP BYexpressions, and slot expressions, and runsAggregationAnalyzer.verifySourceAggregationson the slot expressions. It recordsAnalysis.PivotAnalysis: theGroupingSetAnalysis, theGROUP BY DISTINCTflag, the per-output-column descriptors (name + type), and the list of aggregateFunctionCalls (collected via the existingExpressionTreeUtils.extractAggregateFunctions). It does not synthesize any AST.QueryPlanner.planPivot. It projects aggregation inputs (aggregate args, complex grouping expressions, pivot column references, pivot value expressions), coerces, then constructs each value group's FILTER predicate as IR (Comparison+Logical, with aCastwhen value/column types differ) and projects it as a boolean symbol. AfterplanGroupingSets, it builds oneAggregationNode.Aggregationper(value group, aggregate call)pair withfilterset to the matching predicate symbol, and a singleAggregationNodecontaining all of them. The output projection translates each user-written slot expression through a per-groupTranslationMapoverlay so the samesum(x)resolves to N different output Symbols for N value groups.PIVOTreuses every existing aggregation, GROUP BY, FILTER, and coercion plan-time rule without introducing a new operator.The first two commits are pure refactors that decouple
analyzeGroupByandplanGroupingSetsfromQuerySpecificationso PIVOT can reuse them. Both have no behavior change for the QuerySpec path.Test plan
TestPivot(19 cases): no/withGROUP BY, multi-aggregation, multi-column keys, expression aggregations, expression IN values,GROUPING SETS/CUBE/ROLLUP/GROUP BY (), NULL semantics,pivot-of-pivot, relation alias with column aliases, plus error paths (arity mismatch, missing alias, duplicate output column, non-aggregating slot).TestSqlParsercases (8) covering each grammar shape includingpivotused as an identifier.TestSqlKeywordsandTestSqlParserErrorHandlingupdated to include the newPIVOTkeyword.TestAnalyzer,TestSqlParser,MATCH_RECOGNIZEand aggregation tests continue to pass.mvnd verifyfor the touched modules passes (compile, checkstyle, modernizer, error-prone).Documentation
docs/src/main/sphinx/sql/pivot.mdreference page modelled aftermatch-recognize.md.sql.md, cross-reference fromselect.md, and listing inlanguage/sql-support.md.