Implement ASOF join#27641
Closed
jirislav wants to merge 1 commit into
Closed
Conversation
982ba8e to
500d8a3
Compare
500d8a3 to
1fc140f
Compare
1fc140f to
6ae8105
Compare
Member
|
Thanks @jirislav! I’ll take a look soon. Note, however, that we haven’t yet decided on a syntax for this functionality, so expect some changes. I have some thoughts that I’ll post later. |
Contributor
|
Cross-posting #27703 (comment) |
Author
|
Closing this PR in favor of a more mature solution in #27703 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This implementation adds basic ASOF join support to Trino (requested in #21759). ASOF joins are useful for time-series data where you want to join rows based on equality on some columns plus finding the "closest" match on a time/ordering column.
Supported Syntax
Trino now supports two ASOF join syntaxes:
1. Explicit ON clause with inequality
2. USING clause (last column treated as inequality)
In the USING form, all columns except the last are treated as equality criteria, and the last column is treated as an inequality (
right.column <= left.column).Implementation Details
The planner rewrites ASOF joins into a combination of standard SQL operations:
row_number()over:Example rewrite:
Original query:
Rewritten as (conceptually):
Semantics
a.key = b.key)<=,<,>=,>Limitations
Performance: The current implementation is not optimized for large tables on the right side because:
No connector pushdown: ASOF joins are not pushed down to connectors; they are always executed in the Trino engine
Type requirements: The inequality column must have an orderable type (e.g., DATE, TIMESTAMP, numeric types)
Single inequality: Only one inequality condition is supported; multiple inequality predicates will fail with an error
No pure temporal joins: At least one equi-join criterion is required; pure ASOF joins on time alone are not supported
Future Optimization Opportunities
To improve performance, the following optimizations could be implemented:
Sort-merge ASOF join operator: Instead of LEFT JOIN + window function, implement a dedicated operator that:
O(n log n + m log m)instead ofO(n × m)in worst caseIndex-based lookups: For connectors supporting indexed seeks:
O(n log m)per left rowPredicate pushdown: Push down equi-criteria and timestamp range filters to connectors to reduce data volume
Partitioned execution: When equi-keys are already partitioned/bucketed, process partitions independently in parallel
Connector-specific ASOF join: Some databases (e.g., ClickHouse, QuestDB) have native ASOF join support; push down when possible
Bloom filters / dynamic filtering: Use left-side timestamp ranges to filter right-side data earlier
Error Messages
ASOF join requires at least one equi-join criterion: No equality conditions found in ON clauseASOF join requires a single supported inequality predicate: Either no inequality, multiple inequalities, or unsupported inequality formASOF join inequality requires an orderable type on the build side: The right-side timestamp column type cannot be orderedTesting
The implementation includes:
TestAsofJoin)TestLogicalPlanner.testAsofJoin)TestLogicalPlanner.testAsofJoinOptimizedPlan)TestLogicalPlanner.testAsofJoinUsing*)TestAsofJoinQueries)Conclusion
While this implementation is intentionally simple and not optimized for production workloads with large datasets, it provides correct ASOF join semantics and establishes a foundation for future optimization. The rewrite-based approach leverages existing, well-tested operators (LEFT JOIN, window functions) and can be improved incrementally without changing the user-facing API.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: