Skip to content

Replace PartitionsTable with PartitionsView#28997

Open
tbaeg wants to merge 3 commits into
trinodb:masterfrom
tbaeg:system-views
Open

Replace PartitionsTable with PartitionsView#28997
tbaeg wants to merge 3 commits into
trinodb:masterfrom
tbaeg:system-views

Conversation

@tbaeg
Copy link
Copy Markdown
Member

@tbaeg tbaeg commented Apr 5, 2026

Description

Replace PartitionsTable with PartitionsView.

Problem

Currently, Trino's SystemTable SPI is designed strictly for flat, scan-level operations. It lacks a mechanism to natively trigger a distributed shuffle (Exchange) for aggregations or post-processing. This presents two hurdles:

  1. Coordinator Bottlenecks: While 1:1 metadata projections like Iceberg's $files can be scanned in a distributed fashion, $partitions requires a global GROUP BY. Because the current API cannot distribute this reduction, the aggregation is forced into memory on a single coordinator node.

  2. Code Duplication: Many Iceberg metadata tables analyze the same underlying manifest files. For example, the data required for $partitions can be entirely derived by aggregating the columns already exposed by $files.

Solution

Leverage view(s) on top of SystemTables. That allows connectors to define system "tables" using standard SQL queries. This delegates the projection and aggregation logic back to Trino’s execution engine.

By leveraging view(s), connectors can enable engine-native, distributed operations for metadata. Instead of writing code for every SystemTable where the information is determined by the same source but projected differently, the connector can simply layer a view on top of its existing distributed system table implementations.

Continuing with the Iceberg example, the connector can parallelize $partitions by defining it as a view over $files:

SELECT 
    partition, 
    COUNT(*) AS file_count,
    SUM(record_count) AS record_count,
    SUM(file_size_in_bytes) AS total_size
    ...
FROM "table$files" 
GROUP BY 1

Additional context and related issues

As more connectors adapt this pattern, a formal additional to the SPI (i.e. - SystemView) may be beneficial.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot Bot added the cla-signed label Apr 5, 2026
@github-actions github-actions Bot added the iceberg Iceberg connector label Apr 5, 2026
@tbaeg tbaeg changed the title Draft: Add SystemVIew to SPI Draft: Add SystemView to SPI Apr 5, 2026
@electrum
Copy link
Copy Markdown
Member

electrum commented Apr 6, 2026

Thanks for proposing this. Using a view for the Iceberg $partitions makes sense, and doing it as a normal distributed query is an improvement. From an implementation perspective, I don't think we need to add anything new to the SPI. Instead, IcebergMetadata.getView() could return the view for the system table, before checking for a normal catalog view.

From the security side, I think it only makes sense to use INVOKER security here, as there is no identity (or view "owner") to use for DEFINER security.

@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented Apr 6, 2026

Thanks for proposing this. Using a view for the Iceberg $partitions makes sense, and doing it as a normal distributed query is an improvement. From an implementation perspective, I don't think we need to add anything new to the SPI. Instead, IcebergMetadata.getView() could return the view for the system table, before checking for a normal catalog view.

From the security side, I think it only makes sense to use INVOKER security here, as there is no identity (or view "owner") to use for DEFINER security.

Thanks for the reply! Using ConnectorMetadata.getView() is possible, but I think having a separate SystemView provides:

  1. SPI clarity and intent by reducing code in the relevant methods to expose the system view (i.e. - listTables/listViews/etc).
  2. Related to clarity/intent, but specifically tracing execution path. For example, how a system view lands into information_schema becomes muddied coming from the something like ConnectorMetadata.listTables(). Identifying problems to, and through SystemTablesMetadata is much easier since it's paired down to the "system" resources.
  3. Ability to enforce certain rules for all system views (i.e. - requiring INVOKER security in all system views).

@electrum Thoughts?

@raunaqmorarka
Copy link
Copy Markdown
Member

Having $partitions defined as a view on top of $files makes sense, we already have a situation in #28911 where having this would avoid maintaining redundant code

@electrum
Copy link
Copy Markdown
Member

electrum commented Apr 6, 2026

We shouldn't need to touch any of the listing code, as we don't list these hidden system tables today.

Can you try to implement this using only getView for Iceberg and see if there are any issues?

@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented Apr 6, 2026

We shouldn't need to touch any of the listing code, as we don't list these hidden system tables today.

Can you try to implement this using only getView for Iceberg and see if there are any issues?

Yes, I can try doing it for the dynamic system views.

Is this true for static system views? I believe you'd need to inject listing code if it wasn't part of the SystemTableMetadata.

@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented Apr 6, 2026

I added a static and dynamic view to getView. It works, but the static view does not show up in information_schema without modifying the listX method calls.

@electrum

@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented Apr 6, 2026

Having $partitions defined as a view on top of $files makes sense, we already have a situation in #28911 where having this would avoid maintaining redundant code

This same principle can be applied to other metadata tables (i.e. - $entries/$all_entries) and trivializes adding new system tables like $all_files/$delete_files/$data_files/$all_delete_files/$all_data_files/etc.

I also believe other connectors like deltalake has similar issues with with $partitions; although I'm not too familiar with deltalake.

@raunaqmorarka Did you have any thoughts on the SPI changes?

@tbaeg tbaeg force-pushed the system-views branch 6 times, most recently from 3eb0782 to 554543e Compare April 14, 2026 05:45
@github-actions github-actions Bot added the jdbc Relates to Trino JDBC driver label Apr 14, 2026
@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented Apr 16, 2026

@electrum @raunaqmorarka Gentle reminder.

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Apr 30, 2026

@martint what do you think?

Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Six concerns from a deeper read, posted as inline comments. The overall direction (view-over-$files to reduce maintenance burden) is sound — the comments are about strengthening $files so views layered on it can stay simple, and tightening the SPI plumbing (handles, version, name-collision) so this scales beyond $partitions.

  1. PartitionsView — strengthen $files to expose typed lower/upper bounds; root cause of the type-dispatch fragility, the timestamp-without-tz session-timezone bug, and the null-predicate parity divergence vs. PartitionsTable.
  2. SystemTablesMetadata.getTableHandle — return null for view-only matches; the current behavior produces an unusable handle.
  3. SystemTablesViewsProvider — enforce the unfixed TODO with checkArgument.
  4. MetadataManager — plumb startVersion/endVersion through isView/getView so $partitions FOR VERSION AS OF X errors via the same path as $files FOR VERSION instead of silently expanding at HEAD.
  5. PartitionsView — use IcebergUtil.quotedName for FROM-clause identifiers.
  6. IcebergUtil — split the unrelated Variant-type early-return into its own commit.

{
String viewSql = """
SELECT %s SUM(record_count) AS record_count, COUNT(*) AS file_count, SUM(file_size_in_bytes) AS total_size%s
FROM "%s"."%s"."%s$files"%s
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog/schema/table names are interpolated raw into stored SQL that gets re-parsed on every view use; any " in a name produces broken syntax. Same concern applies to the column name in buildColumnRowType (line 165).

The iceberg plugin already has IcebergUtil.quotedName for this — it skips quoting for simple names and escapes """ otherwise. One-line replacement per call site.

(Mostly moot once the view body is collapsed via richer $files columns — the FROM clause is the only identifier interpolation that remains.)

Comment thread core/trino-main/src/main/java/io/trino/connector/system/SystemTablesMetadata.java Outdated
Optional<CatalogMetadata> catalog = getOptionalCatalogMetadata(session, viewName.catalogName());
if (catalog.isPresent()) {
CatalogMetadata catalogMetadata = catalog.get();
// TODO should this always throw when start/end is set?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving this TODO is a real correctness fix, not just a cleanup.

Today: metadata.isView (line 1614) and getViewInternal both pass Optional.empty(), Optional.empty() to getCatalogHandle. Version is dropped at catalog-handle dispatch, so the existing NOT_SUPPORTED throw in SystemTablesMetadata.getTableHandle never fires for view names. After this PR, SELECT * FROM "tbl$partitions" FOR VERSION AS OF X silently expands at HEAD instead of erroring like $files FOR VERSION AS OF does.

Suggested fix:

  • Plumb startVersion/endVersion through Metadata.isView and Metadata.getView.
  • Forward them into catalogMetadata.getCatalogHandle(session, name, startVersion, endVersion) here and in isView.
  • The existing throw in SystemTablesMetadata.getTableHandle then fires; $partitions FOR VERSION and $files FOR VERSION produce the same error via the same path. Resolves both this TODO and the matching one in getMaterializedViewInternal (line 1908).

{
// TODO https://github.com/trinodb/trino/issues/12920
assertQueryFails("SELECT * FROM \"test_iceberg_read_versioned_table$partitions\" FOR VERSION AS OF " + v1SnapshotId,
assertQueryFails("SELECT * FROM \"test_iceberg_read_versioned_table$files\" FOR VERSION AS OF " + v1SnapshotId,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from $partitions to $files keeps the assertion green but stops covering the actual subject of issue #12920$partitions FOR VERSION is now silently expanded at HEAD instead of throwing (see comment on MetadataManager.getViewInternal).

Once the version plumbing is fixed, restore the original $partitions assertion. Both should error via the same path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should using 'FOR VERSION' with a view always result in an exception?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for generic user-defined views FOR VERSION should reject. There's no defined way to push a time-travel clause through an arbitrary view body. For $partitions specifically we'd ideally want FOR VERSION to work by translating to $files FOR VERSION (the point of #12920), but until that's wired up, an error is the safe behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up adding a separate test since the system view and table are different. I also added the check in the StatementAnalyzer that blanket fails versioning on views.

Does seem right or should it be in the MetadataManager?

@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented May 6, 2026

@raunaqmorarka @electrum Just for clarification, are we in favor of the SPI addition? I think there is a consensus for introducing the view based system tables but not necessarily the interface for them.

Would appreciate thoughts from @findinpath and @1fanwang.

@raunaqmorarka
Copy link
Copy Markdown
Member

I'd prefer the getView-only route. The fact that we have many such metadata tables in iceberg ($entries, $data_files, $delete_files, $all_files, ...) is a good argument for a registry, but the registry can live inside the iceberg module. IcebergMetadata.getView/isView would consult a suffix map before falling through to user views. INVOKER security and naming conventions get enforced once in a shared builder, same as a SystemView SPI would do.

The SPI route adds parallel SystemTable/SystemView dispatch, which already shows two issues in this PR. SystemTablesMetadata.getTableHandle returns a SystemTableHandle for view-only matches (masked today by analyzer ordering), and SystemTablesViewsProvider has an unenforced TODO for table/view name collisions. Both scale with the number of system views.

For information_schema visibility, per-table suffix views shouldn't be enumerated, since that would be combinatorial across every table. With getView, the connector simply doesn't list them in getViews(SchemaTablePrefix). With a SystemView SPI, the engine layer has to actively decide what not to list.

I'd revisit a SystemView SPI when a second connector (delta?) needs the same pattern and we see what engine-side coordination is actually missing.

The independent issues from my inline review (typed $files bounds, FOR VERSION plumbing, identifier escaping, Variant short-circuit split) apply either way.

@tbaeg
Copy link
Copy Markdown
Member Author

tbaeg commented May 8, 2026

@raunaqmorarka Thanks for your feedback!

Given the response(s), I will work toward a getView implementation. I put some responses to your feedback for a possible follow up discussion to around SystemViews.

I'd prefer the getView-only route. The fact that we have many such metadata tables in iceberg ($entries, $data_files, $delete_files, $all_files, ...) is a good argument for a registry, but the registry can live inside the iceberg module. IcebergMetadata.getView/isView would consult a suffix map before falling through to user views. INVOKER security and naming conventions get enforced once in a shared builder, same as a SystemView SPI would do.

While it's possible to have it in it's own registry, as you mentioned, delta is yet another candidate for something very similar.

The SPI route adds parallel SystemTable/SystemView dispatch, which already shows two issues in this PR. SystemTablesMetadata.getTableHandle returns a SystemTableHandle for view-only matches (masked today by analyzer ordering), and SystemTablesViewsProvider has an unenforced TODO for table/view name collisions. Both scale with the number of system views.

Yes, the SystemTableHandle returns view-only/there are name collisions (hence POC, heh). That said, I think there is a viable path to address this that includes the SystemView.

For information_schema visibility, per-table suffix views shouldn't be enumerated, since that would be combinatorial across every table. With getView, the connector simply doesn't list them in getViews(SchemaTablePrefix). With a SystemView SPI, the engine layer has to actively decide what not to list.

There was never an intention to expose dynamic tables to information_schema. Also, we already actively decide what not to list that for SystemTables (i.e. - dynamic vs static system tables via the SystemTablesProvider).

I'd revisit a SystemView SPI when a second connector (delta?) needs the same pattern and we see what engine-side coordination is actually missing.

Historically, iceberg has been the most demanding (as far as SystemTables go), so, it is the best example of what another connector may need.

The independent issues from my inline review (typed $files bounds, FOR VERSION plumbing, identifier escaping, Variant short-circuit split) apply either way.

Agreed. This was very much a POC, so definitely not comprehensive.

@tbaeg tbaeg changed the title Draft: Add SystemView to SPI Replace PartitionsSyste May 10, 2026
@tbaeg tbaeg changed the title Replace PartitionsSyste Replace PartitionsTable with PartitionsView May 10, 2026
@tbaeg tbaeg force-pushed the system-views branch 4 times, most recently from c9f2e98 to bf29c18 Compare May 11, 2026 05:46
@tbaeg tbaeg force-pushed the system-views branch 2 times, most recently from 68b070a to b7639a8 Compare May 12, 2026 06:27
@tbaeg tbaeg force-pushed the system-views branch 2 times, most recently from d22a357 to fe4276b Compare May 12, 2026 13:12
"(?<table>[^$@]+)" +
"(?:\\$(?<type>(?i:" + referencableTableTypes + ")))?");

SYSTEM_VIEW_PATTERN = Pattern.compile("" +
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely some clean up around this regex and the corresponding TABLE_PATTERN to be done. Since both are used in similar places but cause failures when PARTITIONS is filtered out from TABLE_PATTERN.

@Override
public Optional<ConnectorViewDefinition> getView(ConnectorSession session, SchemaTableName viewName)
{
if (isIcebergSystemViewName(viewName.getTableName())) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this pattern works, in the longer term, feels better if there was a distinction for a SystemView. It would have been nice to encapsulate some of this into something like PartitionsSystemTableProvider to avoid this sort of branching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector jdbc Relates to Trino JDBC driver lakehouse

Development

Successfully merging this pull request may close these issues.

4 participants