Skip to content

Pass dynamic filter predicate to ConnectorSplitSource#getNextBatch#29206

Open
chenjian2664 wants to merge 11 commits into
trinodb:masterfrom
chenjian2664:jack/add-domain-to-page-source
Open

Pass dynamic filter predicate to ConnectorSplitSource#getNextBatch#29206
chenjian2664 wants to merge 11 commits into
trinodb:masterfrom
chenjian2664:jack/add-domain-to-page-source

Conversation

@chenjian2664
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 commented Apr 22, 2026

Description

Move dynamic filter waiting logic from individual connectors into the engine layer.

SPI changes:

  • Add ConnectorDynamicFilter(TupleDomain<ColumnHandle> currentPredicate, boolean isComplete) record as a per-batch snapshot of the engine's DF state.
  • Add ConnectorSplitSource#getNextBatch(int, ConnectorDynamicFilter) as the new primary entry point (default delegates to the deprecated getNextBatch(int)). The engine waits up to ConnectorSplitSource#getRequestedDynamicFilterWaitTimeoutMillis() before the first call; subsequent calls pass the current predicate without additional waiting.
  • Add ConnectorSplitSource#getRequestedDynamicFilterWaitTimeoutMillis() (default 0) so each connector continues to control its own DF wait timeout — no new engine-level config needed.
  • Update ConnectorSplitManager#getSplits to accept Set<ColumnHandle> dynamicFilterColumns (static scan-wide info) instead of a live DynamicFilter. The deprecated DynamicFilter overload remains for backward compatibility.

Engine changes:

  • ConnectorAwareSplitSource now owns the DF wait loop: reads getRequestedDynamicFilterWaitTimeoutMillis() once at construction, waits on DynamicFilter.isBlocked() if awaitable and time remains, then passes new ConnectorDynamicFilter(predicate, isComplete) on every getNextBatch call.
  • SplitManager calls the new getSplits(..., dynamicFilter.getColumnsCovered(), constraint) overload.

Connector changes:

  • Iceberg: IcebergSplitSource overrides getNextBatch(int, ConnectorDynamicFilter) and getRequestedDynamicFilterWaitTimeoutMillis(); removes the internal DynamicFilter/Stopwatch fields and wait loop; removes iceberg.dynamic-filtering.wait-timeout config property (@DefunctConfig) and the dynamic_filtering_wait_timeout connector session property.
  • Delta Lake: Same pattern as Iceberg.
  • Hive: HiveSplitSource overrides the new getNextBatch and getRequestedDynamicFilterWaitTimeoutMillis(). The partition loader receives the covered-columns set at construction; the per-batch predicate snapshot is shared via a CompletableFuture updated on each getNextBatch call.
  • JDBC: JdbcDynamicFilteringSplitSource / DynamicFilteringJdbcSplitSource override the new method and timeout.
  • MongoDB, Hudi, LakeHouse: Implement ConnectorSplitManager#getSplits with Set<ColumnHandle>.
  • Connectors that still override the deprecated getNextBatch(int) continue to work via the default bridge — no changes required.

Additional context and related issues

@cla-bot cla-bot Bot added the cla-signed label Apr 22, 2026
@github-actions github-actions Bot added docs iceberg Iceberg connector labels Apr 22, 2026
@chenjian2664 chenjian2664 marked this pull request as draft April 22, 2026 07:00
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 6805608 to da416b1 Compare April 22, 2026 07:48
@github-actions github-actions Bot added bigquery BigQuery connector mongodb MongoDB connector cassandra Cassandra connector pinot Pinot connector prometheus Prometheus connector labels Apr 22, 2026
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from da416b1 to 4c7cf8a Compare April 22, 2026 08:12
@github-actions github-actions Bot added the hudi Hudi connector label Apr 22, 2026
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch 4 times, most recently from 6f38f96 to 0648ae0 Compare April 23, 2026 03:43
@chenjian2664 chenjian2664 marked this pull request as ready for review April 23, 2026 09:47
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 07fc49b to 0648ae0 Compare April 24, 2026 07:34
@wendigo wendigo requested a review from raunaqmorarka April 24, 2026 15:19
@chenjian2664 chenjian2664 requested a review from ebyhr April 27, 2026 06:44
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Apr 27, 2026

Could you rebase on master to resolve conflicts?

@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 0648ae0 to 8c6f835 Compare April 27, 2026 07:56
Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope @raunaqmorarka chimes in this PR.

Comment thread core/trino-main/src/main/java/io/trino/execution/DynamicFilterConfig.java Outdated
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch 5 times, most recently from 5a075fe to 642d95f Compare May 4, 2026 06:13
Comment thread core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitSource.java Outdated
Comment thread core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitSource.java Outdated
Comment thread core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitSource.java Outdated
Comment thread core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitSource.java Outdated
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from cb8f353 to cf05e58 Compare May 5, 2026 07:21
@chenjian2664
Copy link
Copy Markdown
Contributor Author

Updated based on the review feedback:

  • ConnectorSplitSource: Renamed getDynamicFilterWaitTimeoutgetRequestedDynamicFilterWaitTimeoutMillis(), returning long (not OptionalLong) with 0 meaning "no waiting". Each connector continues to provide its own timeout — no engine-level config added. Added @since 481 and @Deprecated(forRemoval = true, since = "481") javadoc as suggested.
  • ConnectorSplitManager: New getSplits(..., Set<ColumnHandle> dynamicFilterColumns, ...) overload replaces the DynamicFilter overload for static scan-wide info. Old DynamicFilter overload deprecated for removal. Engine (SplitManager) calls the new overload.
  • Connectors migrated to new getSplits API: Iceberg, Delta Lake, Hive, JDBC, MongoDB, Hudi, LakeHouse.
  • PR description updated to reflect the current design.

CI running now. The previous run's TestDistributedFaultTolerantEngineOnlyQueries timeout was a pre-existing resource-deadlock flakiness under CI memory pressure (stage 3 held memory waiting for stage 4 output, while stage 4 couldn't get a node because stage 3 held all available memory) — unrelated to this PR. TestFaultTolerantExecutionDynamicFiltering (the DF-specific FTE test) passed 18/18.

@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from d5266ab to 33e75bb Compare May 6, 2026 13:20
@github-actions github-actions Bot added the memory Memory connector label May 6, 2026
@chenjian2664 chenjian2664 requested a review from losipiuk May 6, 2026 13:25
@chenjian2664
Copy link
Copy Markdown
Contributor Author

@losipiuk Could you please take a look at Update engine and classloader wrappers to use new split APIs this one touches FTE relate code paths , I'd appreciate having your eyes on it

@losipiuk
Copy link
Copy Markdown
Member

losipiuk commented May 7, 2026

@losipiuk Could you please take a look at Update engine and classloader wrappers to use new split APIs this one touches FTE relate code paths , I'd appreciate having your eyes on it

This looks fine, but it does not seem very much related to FTE, at least not directly - did I miss sth obvious?

@chenjian2664
Copy link
Copy Markdown
Contributor Author

@losipiuk Thanks for reviewing. I noticed the FTE tests were already failing when I started adding changes to this commit — mainly the tests related to AbstractTestCoordinatorDynamicFiltering and the Memory connector refactoring. I assumed this might also be affecting the FTE tests. :)

@losipiuk
Copy link
Copy Markdown
Member

losipiuk commented May 7, 2026

There may be some timing issues. Can you point me to example failures?

@chenjian2664
Copy link
Copy Markdown
Contributor Author

chenjian2664 commented May 7, 2026

Comment thread core/trino-main/src/main/java/io/trino/split/ConnectorAwareSplitSource.java Outdated
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch 5 times, most recently from 7b64305 to 6eb0808 Compare May 11, 2026 10:26
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 83923f9 to 5e632e5 Compare May 12, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cassandra Cassandra connector cla-signed delta-lake Delta Lake connector docs hive Hive connector hudi Hudi connector iceberg Iceberg connector kafka Kafka connector lakehouse loki Loki connector memory Memory connector mongodb MongoDB connector opensearch OpenSearch connector pinot Pinot connector prometheus Prometheus connector redis Redis connector

Development

Successfully merging this pull request may close these issues.

4 participants