Skip to content

Implement ASOF join#27641

Closed
jirislav wants to merge 1 commit into
trinodb:masterfrom
jirislav:jirislav/asof-join
Closed

Implement ASOF join#27641
jirislav wants to merge 1 commit into
trinodb:masterfrom
jirislav:jirislav/asof-join

Conversation

@jirislav
Copy link
Copy Markdown

@jirislav jirislav commented Dec 13, 2025

Description

This implementation adds basic ASOF join support to Trino (requested in #21759). ASOF joins are useful for time-series data where you want to join rows based on equality on some columns plus finding the "closest" match on a time/ordering column.

Supported Syntax

Trino now supports two ASOF join syntaxes:

1. Explicit ON clause with inequality

SELECT a.time, a.key, b.value 
FROM table_expr1 a 
ASOF JOIN table_expr2 b 
  ON a.key = b.key AND a.time >= b.time

2. USING clause (last column treated as inequality)

SELECT time, key, value
FROM table_expr1 
ASOF JOIN table_expr2 USING (key, time)

In the USING form, all columns except the last are treated as equality criteria, and the last column is treated as an inequality (right.column <= left.column).

Implementation Details

The planner rewrites ASOF joins into a combination of standard SQL operations:

  1. AssignUniqueId on left side (for partitioning)
  2. AssignUniqueId on right side (for deterministic tie-breaking)
  3. LEFT JOIN with original equi-criteria and inequality filter
  4. Window function computing row_number() over:
    • PARTITION BY left_unique_id
    • ORDER BY right_timestamp DESC NULLS LAST, right_unique_id DESC NULLS LAST
  5. Filter where row_number = 1
  6. Project to return only the original output columns

Example rewrite:

Original query:

SELECT a.key, a.ts, b.value
FROM left_table l
ASOF JOIN right_table r
  ON l.key = r.key AND l.ts >= r.ts

Rewritten as (conceptually):

WITH left_with_uid AS (
  SELECT *, ROW_NUMBER() OVER () as left_uid FROM left_table
),
right_with_uid AS (
  SELECT *, ROW_NUMBER() OVER () as right_uid FROM right_table
),
joined AS (
  SELECT l.*, r.*
  FROM left_with_uid l
  LEFT JOIN right_with_uid r
    ON l.key = r.key AND r.ts <= l.ts
),
ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY left_uid 
      ORDER BY r.ts DESC NULLS LAST, right_uid DESC NULLS LAST
    ) as rn
  FROM joined
)
SELECT a_key, a_ts, b_value
FROM ranked
WHERE rn = 1

Semantics

  • Equi-criteria: At least one equality condition is required (e.g., a.key = b.key)
  • Inequality condition: Exactly one inequality comparing a column from each side is required
  • Supported inequalities: <=, <, >=, >
  • Match selection: For each left row, selects the right row with the greatest timestamp that satisfies the inequality
  • Outer semantics: If no matching right row exists, NULL values are returned for right columns (like LEFT JOIN)
  • Tie-breaking: When multiple right rows have the same timestamp, selection is deterministic but unspecified

Limitations

  1. Performance: The current implementation is not optimized for large tables on the right side because:

    • All potentially matching rows from the right table are fetched via LEFT JOIN
    • A window function processes all these rows to find the top-1 per left row
    • No specialized ASOF join operator exists yet
  2. No connector pushdown: ASOF joins are not pushed down to connectors; they are always executed in the Trino engine

  3. Type requirements: The inequality column must have an orderable type (e.g., DATE, TIMESTAMP, numeric types)

  4. Single inequality: Only one inequality condition is supported; multiple inequality predicates will fail with an error

  5. No pure temporal joins: At least one equi-join criterion is required; pure ASOF joins on time alone are not supported

Future Optimization Opportunities

To improve performance, the following optimizations could be implemented:

  1. Sort-merge ASOF join operator: Instead of LEFT JOIN + window function, implement a dedicated operator that:

    • Sorts both inputs by equi-keys + timestamp
    • Performs a merge-join-like scan
    • For each left row, scans right rows until the inequality fails
    • Complexity: O(n log n + m log m) instead of O(n × m) in worst case
  2. Index-based lookups: For connectors supporting indexed seeks:

    • Use equi-keys to locate candidate rows
    • Binary search on timestamp to find the latest match
    • Complexity: O(n log m) per left row
  3. Predicate pushdown: Push down equi-criteria and timestamp range filters to connectors to reduce data volume

  4. Partitioned execution: When equi-keys are already partitioned/bucketed, process partitions independently in parallel

  5. Connector-specific ASOF join: Some databases (e.g., ClickHouse, QuestDB) have native ASOF join support; push down when possible

  6. Bloom filters / dynamic filtering: Use left-side timestamp ranges to filter right-side data earlier

Error Messages

  • ASOF join requires at least one equi-join criterion: No equality conditions found in ON clause
  • ASOF join requires a single supported inequality predicate: Either no inequality, multiple inequalities, or unsupported inequality form
  • ASOF join inequality requires an orderable type on the build side: The right-side timestamp column type cannot be ordered

Testing

The implementation includes:

  • Parser tests for both ON and USING syntax (TestAsofJoin)
  • Logical planner tests verifying ASOF join nodes are created (TestLogicalPlanner.testAsofJoin)
  • Optimized plan tests verifying rewrite to LEFT JOIN + window (TestLogicalPlanner.testAsofJoinOptimizedPlan)
  • USING clause tests (TestLogicalPlanner.testAsofJoinUsing*)
  • Negative tests for error conditions
  • Integration tests with actual execution (TestAsofJoinQueries)

Conclusion

While this implementation is intentionally simple and not optimized for production workloads with large datasets, it provides correct ASOF join semantics and establishes a foundation for future optimization. The rewrite-based approach leverages existing, well-tested operators (LEFT JOIN, window functions) and can be improved incrementally without changing the user-facing API.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot Bot added the cla-signed label Dec 13, 2025
@github-actions github-actions Bot added docs iceberg Iceberg connector delta-lake Delta Lake connector labels Dec 13, 2025
@jirislav jirislav force-pushed the jirislav/asof-join branch 2 times, most recently from 982ba8e to 500d8a3 Compare December 13, 2025 02:09
@jirislav jirislav removed iceberg Iceberg connector delta-lake Delta Lake connector labels Dec 13, 2025
@martint
Copy link
Copy Markdown
Member

martint commented Dec 13, 2025

Thanks @jirislav! I’ll take a look soon.

Note, however, that we haven’t yet decided on a syntax for this functionality, so expect some changes. I have some thoughts that I’ll post later.

@Pluies
Copy link
Copy Markdown
Contributor

Pluies commented Dec 31, 2025

Cross-posting #27703 (comment)

@jirislav
Copy link
Copy Markdown
Author

jirislav commented Jan 2, 2026

Closing this PR in favor of a more mature solution in #27703

@jirislav jirislav closed this Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants