Skip to content

Add NEAREST join support#28937

Merged
martint merged 2 commits into
trinodb:masterfrom
martint:nearest
Apr 14, 2026
Merged

Add NEAREST join support#28937
martint merged 2 commits into
trinodb:masterfrom
martint:nearest

Conversation

@martint
Copy link
Copy Markdown
Member

@martint martint commented Mar 31, 2026

Add support for NEAREST in CROSS JOIN and LEFT JOIN ... ON TRUE.

For example:

...
FROM trades
LEFT JOIN NEAREST (
    FROM quotes
    WHERE quotes.symbol = trades.symbol
    MATCH quotes.ts <= trades.ts
) ON TRUE

and

...
FROM trades
CROSS JOIN NEAREST (
    FROM quotes
    WHERE quotes.symbol = trades.symbol
    MATCH quotes.ts <= trades.ts
)

The current implementation produces a naive plan that leverages existing plan IR and operators. A more performant implementation will require custom plan nodes and a dedicated join-like operator.

CROSS JOIN plan shape:

- Project[left columns..., nearest columns...]
   - TopNRanking[ROW_NUMBER by (left_unique) order by (candidate_key)] 
       - Join[INNER on (WHERE and MATCH)] 
           - AssignUniqueId[left_unique] 
               - left input 
           - NEAREST source

LEFT JOIN ... ON TRUE plan shape:

- Project[left columns..., nearest columns...]
   - TopNRanking[ROW_NUMBER by (left_unique) order by (candidate_key)] 
       - Join[LEFT on (WHERE and MATCH)] 
           - AssignUniqueId[left_unique] 
               - left input 
           - NEAREST source

Release notes

(x) Release notes are required, with the following suggested text:

## General
* Add support for the `NEAREST` clause for approximate matches in joins. ({issue}`21759`)

@cla-bot cla-bot Bot added the cla-signed label Mar 31, 2026
@martint martint requested a review from kasiafi March 31, 2026 01:55
@martint martint force-pushed the nearest branch 2 times, most recently from ff3e7b9 to 75fa72b Compare March 31, 2026 03:52
Comment thread core/trino-main/src/main/java/io/trino/sql/planner/RelationPlanner.java Outdated
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 2, 2026

This is awesome proof of concept! I am not sure the syntax is ideal from user perspective yet. I have posted syntax considerations to the issue:

Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread nearest-join-plan.md Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/planner/RelationPlanner.java Outdated
@martint
Copy link
Copy Markdown
Member Author

martint commented Apr 10, 2026

@kasiafi, updated.

Also, added support for implicit ON TRUE.

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 10, 2026

What is implicit ON TRUE? is it only for NEAREST or all the joins?

@martint
Copy link
Copy Markdown
Member Author

martint commented Apr 10, 2026

For all joins. It’s a similar idea to how the FROM clause is optional.

Comment on lines +484 to +487
| NEAREST '('
FROM relation
(WHERE where=booleanExpression)?
MATCH match=booleanExpression ')' #nearest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not NEAREST (relation) MATCH match?

This would make for a simpler grammar that's more familiar to users. Otherwise there will be questions why I cannot SELECT before the FROM or, why i can have WHERE but not HAVING, etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a WHERE clause to link the left and right sides of the join. I don't think that's more familiar to users. NEAREST is nothing but a simplified (syntactic sugar for) form of:

LATERAL (
    SELECT *
    FROM <relation>
    WHERE <condition> AND left.match_column <op> right.match_column
    ORDER BY <match column>
    LIMIT 1
)

If users want to do something more complicated, they can always use the explicit form.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a WHERE clause to link the left and right sides of the join.

If we're thinking about NEAREST as a special case of a join, we could use a syntax that describes that.

if we're thinking about NEAREST MATCH as a way to find the best match, then WHERE clause is no special.
The lateral subquery could be just values, some aggregation or something else. The MATCH is special, the WHERE is not. I don't see why grammar would make WHERE special.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEAREST is not a special kind of join. It is a table-like relation, similar in spirit to UNNEST or JSON_TABLE, whose job is to return the row from its input that is closest to some anchor expression from the left side. In that model, MATCH is special because it defines what "closest" means. WHERE is not special, but it's what's needed for the function to constrain the candidate set based on some row on the left side of the join.

@kasiafi
Copy link
Copy Markdown
Member

kasiafi commented Apr 13, 2026

Also, added support for implicit ON TRUE.

Is there implicit ON TRUE? I think that implicit join takes no criteria.
Also, we could support INNER JOIN ON TRUE.

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 13, 2026

For all joins. It’s a similar idea to how the FROM clause is optional.

Snowflake's JOIN has optional ON clause, and I remember I had some problems with grammar ambiguities (which JOIN does this ON pertain to?) when working on antlr grammar for Snowflake. I don't remember the specifics though.

Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/planner/RelationPlanner.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated
@martint
Copy link
Copy Markdown
Member Author

martint commented Apr 13, 2026

Is there implicit ON TRUE? I think that implicit join takes no criteria.
Also, we could support INNER JOIN ON TRUE.

Right, what I meant by implicit was "implicit condition". I'll add support for implicit joins (i.e., SELECT ... FROM t, NEAREST(...))

@martint
Copy link
Copy Markdown
Member Author

martint commented Apr 13, 2026

... some problems with grammar ambiguities ...

Let me look into that. In the meantime, I'll back out that extension. We can re-introduce it later to improve usability.

Copy link
Copy Markdown
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % LogicalPlanner test and the question about correlation in WHERE.

Comment thread core/trino-main/src/test/java/io/trino/sql/planner/TestLogicalPlanner.java Outdated
@martint martint force-pushed the nearest branch 2 times, most recently from 012b1ca to 829fe5d Compare April 14, 2026 16:46
martint added 2 commits April 14, 2026 09:59
Add support for NEAREST in CROSS JOIN, LEFT/INNER JOIN ... ON TRUE, and implicit joins.

The current implementation produces a naive plan that leverages existing plan IR 
and operators. A more performant implementation will require custom plan nodes and 
a dedicated join-like operator.

CROSS JOIN plan shape:

- Project[left columns..., nearest columns...]
   - TopNRanking[ROW_NUMBER by (left_unique) order by (candidate_key)]
      - Join[INNER on (WHERE and MATCH)]
         - AssignUniqueId[left_unique]
            - left input
         - NEAREST source

LEFT JOIN ... ON TRUE plan shape:

- Project[left columns..., nearest columns...]
   - TopNRanking[ROW_NUMBER by (left_unique) order by (candidate_key)]
      - Join[LEFT on (WHERE and MATCH)]
         - AssignUniqueId[left_unique]
            - left input
         - NEAREST source
@martint martint merged commit f7a5b69 into trinodb:master Apr 14, 2026
192 of 194 checks passed
@martint martint deleted the nearest branch April 14, 2026 22:16
@github-actions github-actions Bot added this to the 481 milestone Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Add support for ASOF join

4 participants