Replace PartitionsTable with PartitionsView#28997
Conversation
|
Thanks for proposing this. Using a view for the Iceberg From the security side, I think it only makes sense to use |
Thanks for the reply! Using
@electrum Thoughts? |
|
Having |
|
We shouldn't need to touch any of the listing code, as we don't list these hidden system tables today. Can you try to implement this using only |
Yes, I can try doing it for the dynamic system views. Is this true for static system views? I believe you'd need to inject listing code if it wasn't part of the |
|
I added a static and dynamic view to |
This same principle can be applied to other metadata tables (i.e. - I also believe other connectors like @raunaqmorarka Did you have any thoughts on the SPI changes? |
3eb0782 to
554543e
Compare
|
@electrum @raunaqmorarka Gentle reminder. |
|
@martint what do you think? |
There was a problem hiding this comment.
Six concerns from a deeper read, posted as inline comments. The overall direction (view-over-$files to reduce maintenance burden) is sound — the comments are about strengthening $files so views layered on it can stay simple, and tightening the SPI plumbing (handles, version, name-collision) so this scales beyond $partitions.
- PartitionsView — strengthen
$filesto expose typed lower/upper bounds; root cause of the type-dispatch fragility, the timestamp-without-tz session-timezone bug, and the null-predicate parity divergence vs.PartitionsTable. - SystemTablesMetadata.getTableHandle — return null for view-only matches; the current behavior produces an unusable handle.
- SystemTablesViewsProvider — enforce the unfixed TODO with
checkArgument. - MetadataManager — plumb
startVersion/endVersionthroughisView/getViewso$partitions FOR VERSION AS OF Xerrors via the same path as$files FOR VERSIONinstead of silently expanding at HEAD. - PartitionsView — use
IcebergUtil.quotedNamefor FROM-clause identifiers. - IcebergUtil — split the unrelated Variant-type early-return into its own commit.
| { | ||
| String viewSql = """ | ||
| SELECT %s SUM(record_count) AS record_count, COUNT(*) AS file_count, SUM(file_size_in_bytes) AS total_size%s | ||
| FROM "%s"."%s"."%s$files"%s |
There was a problem hiding this comment.
Catalog/schema/table names are interpolated raw into stored SQL that gets re-parsed on every view use; any " in a name produces broken syntax. Same concern applies to the column name in buildColumnRowType (line 165).
The iceberg plugin already has IcebergUtil.quotedName for this — it skips quoting for simple names and escapes " → "" otherwise. One-line replacement per call site.
(Mostly moot once the view body is collapsed via richer $files columns — the FROM clause is the only identifier interpolation that remains.)
| Optional<CatalogMetadata> catalog = getOptionalCatalogMetadata(session, viewName.catalogName()); | ||
| if (catalog.isPresent()) { | ||
| CatalogMetadata catalogMetadata = catalog.get(); | ||
| // TODO should this always throw when start/end is set? |
There was a problem hiding this comment.
Resolving this TODO is a real correctness fix, not just a cleanup.
Today: metadata.isView (line 1614) and getViewInternal both pass Optional.empty(), Optional.empty() to getCatalogHandle. Version is dropped at catalog-handle dispatch, so the existing NOT_SUPPORTED throw in SystemTablesMetadata.getTableHandle never fires for view names. After this PR, SELECT * FROM "tbl$partitions" FOR VERSION AS OF X silently expands at HEAD instead of erroring like $files FOR VERSION AS OF does.
Suggested fix:
- Plumb
startVersion/endVersionthroughMetadata.isViewandMetadata.getView. - Forward them into
catalogMetadata.getCatalogHandle(session, name, startVersion, endVersion)here and inisView. - The existing throw in
SystemTablesMetadata.getTableHandlethen fires;$partitions FOR VERSIONand$files FOR VERSIONproduce the same error via the same path. Resolves both this TODO and the matching one ingetMaterializedViewInternal(line 1908).
| { | ||
| // TODO https://github.com/trinodb/trino/issues/12920 | ||
| assertQueryFails("SELECT * FROM \"test_iceberg_read_versioned_table$partitions\" FOR VERSION AS OF " + v1SnapshotId, | ||
| assertQueryFails("SELECT * FROM \"test_iceberg_read_versioned_table$files\" FOR VERSION AS OF " + v1SnapshotId, |
There was a problem hiding this comment.
Switching from $partitions to $files keeps the assertion green but stops covering the actual subject of issue #12920 — $partitions FOR VERSION is now silently expanded at HEAD instead of throwing (see comment on MetadataManager.getViewInternal).
Once the version plumbing is fixed, restore the original $partitions assertion. Both should error via the same path.
There was a problem hiding this comment.
Should using 'FOR VERSION' with a view always result in an exception?
There was a problem hiding this comment.
Yes, for generic user-defined views FOR VERSION should reject. There's no defined way to push a time-travel clause through an arbitrary view body. For $partitions specifically we'd ideally want FOR VERSION to work by translating to $files FOR VERSION (the point of #12920), but until that's wired up, an error is the safe behavior.
There was a problem hiding this comment.
Ended up adding a separate test since the system view and table are different. I also added the check in the StatementAnalyzer that blanket fails versioning on views.
Does seem right or should it be in the MetadataManager?
|
@raunaqmorarka @electrum Just for clarification, are we in favor of the SPI addition? I think there is a consensus for introducing the view based system tables but not necessarily the interface for them. Would appreciate thoughts from @findinpath and @1fanwang. |
|
I'd prefer the The SPI route adds parallel For I'd revisit a SystemView SPI when a second connector (delta?) needs the same pattern and we see what engine-side coordination is actually missing. The independent issues from my inline review (typed |
|
@raunaqmorarka Thanks for your feedback! Given the response(s), I will work toward a
While it's possible to have it in it's own registry, as you mentioned,
Yes, the
There was never an intention to expose dynamic tables to
Historically, iceberg has been the most demanding (as far as
Agreed. This was very much a POC, so definitely not comprehensive. |
c9f2e98 to
bf29c18
Compare
68b070a to
b7639a8
Compare
d22a357 to
fe4276b
Compare
| "(?<table>[^$@]+)" + | ||
| "(?:\\$(?<type>(?i:" + referencableTableTypes + ")))?"); | ||
|
|
||
| SYSTEM_VIEW_PATTERN = Pattern.compile("" + |
There was a problem hiding this comment.
Likely some clean up around this regex and the corresponding TABLE_PATTERN to be done. Since both are used in similar places but cause failures when PARTITIONS is filtered out from TABLE_PATTERN.
| @Override | ||
| public Optional<ConnectorViewDefinition> getView(ConnectorSession session, SchemaTableName viewName) | ||
| { | ||
| if (isIcebergSystemViewName(viewName.getTableName())) { |
There was a problem hiding this comment.
While this pattern works, in the longer term, feels better if there was a distinction for a SystemView. It would have been nice to encapsulate some of this into something like PartitionsSystemTableProvider to avoid this sort of branching.
Description
Replace
PartitionsTablewithPartitionsView.Problem
Currently, Trino's
SystemTableSPI is designed strictly for flat, scan-level operations. It lacks a mechanism to natively trigger a distributed shuffle (Exchange) for aggregations or post-processing. This presents two hurdles:Coordinator Bottlenecks: While 1:1 metadata projections like Iceberg's
$filescan be scanned in a distributed fashion,$partitionsrequires a global GROUP BY. Because the current API cannot distribute this reduction, the aggregation is forced into memory on a single coordinator node.Code Duplication: Many Iceberg metadata tables analyze the same underlying manifest files. For example, the data required for
$partitionscan be entirely derived by aggregating the columns already exposed by$files.Solution
Leverage view(s) on top of
SystemTables. That allows connectors to define system "tables" using standard SQL queries. This delegates the projection and aggregation logic back to Trino’s execution engine.By leveraging view(s), connectors can enable engine-native, distributed operations for metadata. Instead of writing code for every
SystemTablewhere the information is determined by the same source but projected differently, the connector can simply layer a view on top of its existing distributed system table implementations.Continuing with the Iceberg example, the connector can parallelize
$partitionsby defining it as a view over$files:Additional context and related issues
As more connectors adapt this pattern, a formal additional to the SPI (i.e. -
SystemView) may be beneficial.Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: