Skip to content

Remove predicate from Constraint#29289

Open
chenjian2664 wants to merge 12 commits into
trinodb:masterfrom
chenjian2664:jack/constraint-refactor
Open

Remove predicate from Constraint#29289
chenjian2664 wants to merge 12 commits into
trinodb:masterfrom
chenjian2664:jack/constraint-refactor

Conversation

@chenjian2664
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 commented Apr 30, 2026

Description

Currently, Constraint may not fully capture all predicate information required for partition pruning when expressions cannot be translated into ConnectorExpression, and may not able to use if we want to use it on worker(maybe in future).
This pr is going to remove the predicate from it and make it finally fully serializable thus keep the Constraint as the single source of truth for pruning-related information.

Design

The key idea is to introduce an engine-level abstraction for filter expressions that cannot be represented as ConnectorExpression, without exposing any connector-specific concepts:

  • Introduce ConstraintExpression to represent filter expressions not
    convertible to ConnectorExpression

    • Carries IR Expression and symbol-to-column assignments
    • Opaque to connectors; does not expose connector-specific concepts
  • Introduce ConstraintExpressionEvaluator

    • Uses engine-side IR optimizer for evaluation
    • Keeps evaluation logic in trino-main
  • Remove predicate from Constraint

Notes

  • No change in pruning logic; refactoring only
  • Avoids exposing connector-internal concepts to the engine
  • Enables future use cases where Constraint may be evaluated on workers

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot Bot added the cla-signed label Apr 30, 2026
@github-actions github-actions Bot added iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Apr 30, 2026
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch from 5e0db2f to 7a942c8 Compare April 30, 2026 08:31
@chenjian2664 chenjian2664 marked this pull request as draft April 30, 2026 09:04
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch 4 times, most recently from 71ea116 to 02fd4ff Compare April 30, 2026 11:08
@github-actions github-actions Bot added the postgresql PostgreSQL connector label Apr 30, 2026
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch 2 times, most recently from 0b5d5ca to adf1267 Compare May 1, 2026 03:20
@chenjian2664 chenjian2664 marked this pull request as ready for review May 4, 2026 03:26
}
}
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "@type")
public interface PartitionExpression {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partitions are an internal connector detail. We've intentionally avoided exposing such a concept to the engine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class isn't meant to carry connector details back to the engine, it's designed to represent a filter expression that the connector can't recognize as a ConnectorExpression, but which we still want to push down for evaluation. As I understand it, it's used mostly for partition pruning in a specific scenario though, I renamed it to ConstraintExpression to avoid confusion, please suggest the name.
We need the main module's ability to leverage the IR optimizer for evaluating values that come from the connector. Previously we used a predicate to check ,from the connector's perspective, they are exposing exactly the same data to the engine (unless I'm misunderstanding something).

For the newly added PartitionExpression (now ConstraintExpression), I've also been thinking we could carry this info into a table handle. But in any case, I think we should carry the context as an actual Expression, though it's invisible to connectors, that means we need an abstract ConstraintExpression interface for carrying this. What do you think?

@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch 4 times, most recently from 3160ebc to 6a60994 Compare May 5, 2026 03:20
@chenjian2664 chenjian2664 requested a review from martint May 7, 2026 01:35
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch 3 times, most recently from c02645d to 4e669b3 Compare May 7, 2026 10:13
@chenjian2664
Copy link
Copy Markdown
Contributor Author

@martint Please take a look when you are available, thanks

Comment thread core/trino-spi/src/main/java/io/trino/spi/connector/ConstraintExpression.java Outdated
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch from 4e669b3 to 1a1d387 Compare May 11, 2026 10:23
@github-actions github-actions Bot added the docs label May 11, 2026
@github-actions github-actions Bot added ui Web UI jdbc Relates to Trino JDBC driver duckdb DuckDB connector redis Redis connector redshift Redshift connector sqlserver SQLServer connector labels May 11, 2026
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch from 1a1d387 to e7d6e28 Compare May 11, 2026 10:24
@chenjian2664 chenjian2664 changed the title Make Constraint serializable Remove predicate from Constraint May 11, 2026
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch from e7d6e28 to 70e9084 Compare May 11, 2026 10:45
chenjian2664 added a commit to chenjian2664/ByteQuay that referenced this pull request May 11, 2026
`useState<boolean>(thread.resolved === true)` only ran at mount, so a
thread that arrived with `resolved == null` on the initial REST fetch
stayed open even after the follow-up GraphQL fetch flipped it to true.
On long PRs (e.g. trinodb/trino#29289) this left a wall of resolved
threads expanded by default.

Derive `folded` from props each render, with a `foldOverride` that
pins the user's manual chevron click so subsequent refreshes don't
re-fold a thread they explicitly opened. Same fix in both surfaces —
the main PR detail page (ReviewThreadCard) and the inline-in-diff
view (DiffViewerScreen).

The optimistic-resolve test now asserts via the always-visible
resolved pill instead of the Unresolve button — the button now
auto-folds away on click (matching github.com), but the optimistic
local-state patch is still observable in the header pill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@martint
Copy link
Copy Markdown
Member

martint commented May 11, 2026

The direction is right — removing the non-serializable Predicate and making Constraint serializable is the correct goal. I think the design can be simpler, though, by staying within ConnectorExpression rather than introducing a parallel type hierarchy.

Alternative: synthetic $engine_predicate function + general evaluator

Instead of a new opaque ConstraintExpression marker interface and a separate ConstraintEvaluator, represent the engine-evaluated predicate as a Call with a well-known synthetic function name inside the existing ConnectorExpression tree:

Call($engine_predicate, [
    Constant(serialized_ir_bytes, VARBINARY),  // serialized engine IR expression
    Variable("col_a"),                          // columns referenced by the expression
    Variable("col_b"),
    ...
])

The serialized IR expression is the same thing InternalConstraintExpression carries today. Referenced columns are explicit Variable arguments, so connectors can inspect them without a separate referencedColumns() dispatch through an opaque handle.

Add a general ConnectorExpressionEvaluator to ConnectorContext that can evaluate any ConnectorExpression. For standard nodes it evaluates directly; for $engine_predicate leaves it decodes the bytes and runs the IR optimizer. Connectors delegate whatever they can't handle natively to this evaluator.

Key consequence: Constraint becomes simpler

With a general evaluator, getExpression() and the "remainder" unify into a single ConnectorExpression. There's no need for a second expression field on Constraint — the connector-pushable conjuncts and the engine-evaluated conjuncts are all in one tree. The connector pushes down what it understands and delegates the rest to the evaluator.

Why this is preferable to the opaque approach

  • No new SPI types. ConstraintExpression and ConstraintEvaluator go away; everything stays in the existing ConnectorExpression / ConnectorContext surface.
  • Connectors can inspect the expression. Referenced columns are visible as Variable arguments, so Delta Lake and Hive don't need referencedColumns() dispatched through an opaque handle.
  • The evaluator is general-purpose. A ConnectorExpressionEvaluator that handles any ConnectorExpression is useful beyond partition pruning — connectors can use it for any filtering they'd otherwise implement themselves.
  • Backwards compatible. Connectors encountering an unknown function name already return Optional.empty() gracefully.

The ResolvedFunction serialization requirement (the @JsonProperty additions in this PR) remains regardless of approach — that part is correct.

@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch from 70e9084 to 738ba05 Compare May 12, 2026 07:49
@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch 3 times, most recently from 8fea433 to 2633357 Compare May 12, 2026 08:50
@chenjian2664
Copy link
Copy Markdown
Contributor Author

@martint Thanks for the suggestion to put $engine_predicate directly inside expression. I agree with the overall direction, but I found that we need to add an additional full-assignments map alongside the existing assignments.

When $engine_predicate is AND-ed into expression, the assignments map needs to cover two distinct sets of variables:

  • Connector-expression variables - the column handles for the connector-translatable conjuncts (e.g. a LIKE or equality on a partition column). These are what connectors interpret and push down.
  • Engine-predicate variables - the column handles for symbols appearing in the IR predicate encoded inside $engine_predicate. These are engine-internal and not meaningful to connectors, but InternalConnectorExpressionEvaluator needs them to resolve variable names to column handles when evaluating the predicate against partition values.

So a single assignments map covering both, connectors cannot distinguish which handles belong to which part. That distinction is required in at least these places:

  1. 1IcebergMetadata#applyFilter- uses assignments to determineconstraintColumns` (the set of columns that are actually referenced in the predicate, persisted in the table handle). Including engine-predicate variables here would incorrectly widen the set to all scan columns.
  2. DeltaLakeMetadata#applyFilter - same constraintColumns pattern; on master this was served by the narrow getPredicateColumns().
  3. HivePartitionManager, same: builds constraintColumns from the predicate column handles.
  4. InformationSchemaMetadata.calculatePrefixesWithSchemaName/TableName and ColumnJdbcTable — pass assignments to connectorExpressionEvaluator.evaluate(). For these to correctly evaluate $engine_predicate (and filter information-schema tables by the LIKE predicate), the evaluator needs the full assignments so it can map each Variable name in $engine_predicate to a ColumnHandle, then look up that handle's value in the partition bindings. If only connector-expression variables are present, the $engine_predicate variables go unresolved and the evaluator falls back to true, defeating the filter.
    I want to keep assignments narrow (only variables in the connector-translatable conjuncts of expression, restoring the semantics that getAssignments() had before the $engine_predicate work), and add a second field
    evaluationAssignments covering all scan column variables - at least cover all the engine predicate columns. Connectors continue to call getAssignments.values for constraint-column tracking. Evaluators (engine-internal or connector-owned) call getEvaluationAssignments when they need to resolve $engine_predicate variable bindings.

@chenjian2664 chenjian2664 force-pushed the jack/constraint-refactor branch from 2633357 to fbabd0b Compare May 12, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector docs duckdb DuckDB connector hive Hive connector iceberg Iceberg connector jdbc Relates to Trino JDBC driver postgresql PostgreSQL connector redis Redis connector redshift Redshift connector sqlserver SQLServer connector ui Web UI

Development

Successfully merging this pull request may close these issues.

3 participants