Add NEAREST join support#28937
Conversation
ff3e7b9 to
75fa72b
Compare
|
This is awesome proof of concept! I am not sure the syntax is ideal from user perspective yet. I have posted syntax considerations to the issue: |
|
@kasiafi, updated. Also, added support for implicit ON TRUE. |
|
What is implicit ON TRUE? is it only for NEAREST or all the joins? |
|
For all joins. It’s a similar idea to how the FROM clause is optional. |
| | NEAREST '(' | ||
| FROM relation | ||
| (WHERE where=booleanExpression)? | ||
| MATCH match=booleanExpression ')' #nearest |
There was a problem hiding this comment.
Why not NEAREST (relation) MATCH match?
This would make for a simpler grammar that's more familiar to users. Otherwise there will be questions why I cannot SELECT before the FROM or, why i can have WHERE but not HAVING, etc.
There was a problem hiding this comment.
You need a WHERE clause to link the left and right sides of the join. I don't think that's more familiar to users. NEAREST is nothing but a simplified (syntactic sugar for) form of:
LATERAL (
SELECT *
FROM <relation>
WHERE <condition> AND left.match_column <op> right.match_column
ORDER BY <match column>
LIMIT 1
)If users want to do something more complicated, they can always use the explicit form.
There was a problem hiding this comment.
You need a WHERE clause to link the left and right sides of the join.
If we're thinking about NEAREST as a special case of a join, we could use a syntax that describes that.
if we're thinking about NEAREST MATCH as a way to find the best match, then WHERE clause is no special.
The lateral subquery could be just values, some aggregation or something else. The MATCH is special, the WHERE is not. I don't see why grammar would make WHERE special.
There was a problem hiding this comment.
NEAREST is not a special kind of join. It is a table-like relation, similar in spirit to UNNEST or JSON_TABLE, whose job is to return the row from its input that is closest to some anchor expression from the left side. In that model, MATCH is special because it defines what "closest" means. WHERE is not special, but it's what's needed for the function to constrain the candidate set based on some row on the left side of the join.
Is there implicit ON TRUE? I think that implicit join takes no criteria. |
Snowflake's JOIN has optional ON clause, and I remember I had some problems with grammar ambiguities (which JOIN does this ON pertain to?) when working on antlr grammar for Snowflake. I don't remember the specifics though. |
Right, what I meant by implicit was "implicit condition". I'll add support for implicit joins (i.e., |
Let me look into that. In the meantime, I'll back out that extension. We can re-introduce it later to improve usability. |
kasiafi
left a comment
There was a problem hiding this comment.
LGTM % LogicalPlanner test and the question about correlation in WHERE.
012b1ca to
829fe5d
Compare
Add support for NEAREST in CROSS JOIN, LEFT/INNER JOIN ... ON TRUE, and implicit joins.
The current implementation produces a naive plan that leverages existing plan IR
and operators. A more performant implementation will require custom plan nodes and
a dedicated join-like operator.
CROSS JOIN plan shape:
- Project[left columns..., nearest columns...]
- TopNRanking[ROW_NUMBER by (left_unique) order by (candidate_key)]
- Join[INNER on (WHERE and MATCH)]
- AssignUniqueId[left_unique]
- left input
- NEAREST source
LEFT JOIN ... ON TRUE plan shape:
- Project[left columns..., nearest columns...]
- TopNRanking[ROW_NUMBER by (left_unique) order by (candidate_key)]
- Join[LEFT on (WHERE and MATCH)]
- AssignUniqueId[left_unique]
- left input
- NEAREST source
Add support for NEAREST in CROSS JOIN and LEFT JOIN ... ON TRUE.
For example:
and
The current implementation produces a naive plan that leverages existing plan IR and operators. A more performant implementation will require custom plan nodes and a dedicated join-like operator.
CROSS JOIN plan shape:
LEFT JOIN ... ON TRUE plan shape:
Release notes
(x) Release notes are required, with the following suggested text: