Skip to content

Add added_snapshot_id column to Iceberg $files table#28911

Merged
chenjian2664 merged 1 commit into
trinodb:masterfrom
kaveti:partitions-delete-observability
May 1, 2026
Merged

Add added_snapshot_id column to Iceberg $files table#28911
chenjian2664 merged 1 commit into
trinodb:masterfrom
kaveti:partitions-delete-observability

Conversation

@kaveti
Copy link
Copy Markdown
Contributor

@kaveti kaveti commented Mar 29, 2026

Summary

  • Add added_snapshot_id to the Iceberg $files system table.
  • Populate added_snapshot_id from live manifest entry snapshot metadata.
  • Preserve upstream $files schema additions introduced after this branch diverged.
  • Update tests and docs to match the merged schema.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fixes https://github.com/trinodb/trino/issues/28910

Fixes #28910

@cla-bot cla-bot Bot added the cla-signed label Mar 29, 2026
@github-actions github-actions Bot added the iceberg Iceberg connector label Mar 29, 2026
@kaveti kaveti requested review from Math-ias, martint and wendigo March 29, 2026 06:59
@ebyhr ebyhr requested review from chenjian2664, Copilot, ebyhr and findinpath and removed request for Math-ias, martint and wendigo March 29, 2026 11:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds partition-level delete-file metrics to Trino’s Iceberg $partitions system table to make it easy to identify partitions accumulating position/equality deletes without scanning $files (Fixes #28910).

Changes:

  • Extend IcebergStatistics to track position/equality delete file and record counts.
  • Update PartitionsTable to populate the new metrics by iterating FileScanTask.deletes() (with deduplication).
  • Update and add tests to validate the new $partitions columns and shifted column indexes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergStatistics.java Adds delete-file counters + builder ingestion method.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/PartitionsTable.java Adds new $partitions columns and aggregates delete metrics from scan tasks.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Adds coverage for new delete metric columns (position + equality + OPTIMIZE reset).
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergSystemTables.java Updates $partitions schema assertions and field indexes.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Updates column list expectation for $partitions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated
@kaveti kaveti force-pushed the partitions-delete-observability branch from 63810f9 to aa9a341 Compare March 29, 2026 15:48
Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg: Add delete file metrics to $partitions metadata table

We don't use "Iceberg: " prefix in a commit title. Please follow https://trino.io/development/process#pull-request-and-commit-guidelines

Changes: ... in a commit body isn't so helpful. You can remove it.

Comment thread plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated
@kaveti kaveti changed the title Iceberg: Add delete file metrics to $partitions metadata table Add delete file metrics to $partitions metadata table Mar 30, 2026
@kaveti kaveti force-pushed the partitions-delete-observability branch 2 times, most recently from 5f3a572 to f02f75a Compare March 30, 2026 05:12
@kaveti
Copy link
Copy Markdown
Contributor Author

kaveti commented Mar 30, 2026

@ebyhr i have addressed review comments. could you please review again.

@findinpath
Copy link
Copy Markdown
Contributor

@kaveti is the failure related to your changes?

https://github.com/trinodb/trino/actions/runs/23729005130/job/69119458673?pr=28911

 2026-03-30 11:12:45 SEVERE: Failure cause:
tests               | java.lang.AssertionError: Expected row count to be <5>, but was <9>; rows=[[partition, row("a" varchar, "b" varchar)], [record_count, bigint], [file_count, bigint], [total_size, bigint], [position_delete_file_count, bigint], [position_delete_record_count, bigint], [equality_delete_file_count, bigint], [equality_delete_record_count, bigint], [data, row("a" row("min" varchar, "max" varchar, "null_count" bigint, "nan_count" bigint), "c" row("min" varchar, "max" varchar, "null_count" bigint, "nan_count" bigint))]]
tests               | 	at io.trino.tests.product.iceberg.TestIcebergPartitionEvolution.testDroppedPartitionField(TestIcebergPartitionEvolution.java:78)
tests               | 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
tests               | 	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
tests               | 	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
tests               | 	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
tests               | 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
tests               | 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:1474)
tests               | 

@kaveti kaveti force-pushed the partitions-delete-observability branch from f02f75a to 46cbb07 Compare March 30, 2026 17:19
@github-actions github-actions Bot added the docs label Mar 30, 2026
kaveti added a commit to kaveti/trino that referenced this pull request Mar 30, 2026
Add testFilesTableDeleteFileDeduplication to BaseIcebergSystemTables
that verifies the $files table shows each delete file exactly once,
with no duplicate entries from FileScanTask expansion.

Follow-up to trinodb#28911 as requested by findinpath.
kaveti added a commit to kaveti/trino that referenced this pull request Mar 30, 2026
Deduplicate files by path in FilesTablePageSource since the same
delete file can appear multiple times. Add a HashSet to track seen
file paths and skip duplicates.

Add testFilesTableDeleteFileDeduplication to BaseIcebergSystemTables
that verifies the $files table shows each delete file exactly once.

Follow-up to trinodb#28911 as requested by findinpath.
kaveti added a commit to kaveti/trino that referenced this pull request Apr 16, 2026
Add testFilesTableDeleteFileDeduplication to BaseIcebergSystemTables
that verifies the $files table shows each delete file exactly once,
with no duplicate entries (v2 position + equality deletes).

Add testFilesTableDeletionVectors that verifies v3 deletion vector
behavior: multiple DV entries share the same Puffin file_path in the
$files table. Currently there are no content_offset/content_size_in_bytes
columns to distinguish individual DVs within the shared Puffin file.

Follow-up to trinodb#28911 as requested by findinpath.
@kaveti kaveti force-pushed the partitions-delete-observability branch from 547d08e to 199d45a Compare April 17, 2026 07:05
@github-actions github-actions Bot added ui Web UI jdbc Relates to Trino JDBC driver hudi Hudi connector delta-lake Delta Lake connector hive Hive connector bigquery BigQuery connector mongodb MongoDB connector exasol Exasol connector faker Faker connector google-sheets Google Sheets connector pinot Pinot connector postgresql PostgreSQL connector redis Redis connector redshift Redshift connector lakehouse labels Apr 17, 2026
@kaveti kaveti force-pushed the partitions-delete-observability branch 3 times, most recently from 39d4fb8 to acd4f2d Compare April 17, 2026 11:19
Comment thread docs/src/main/sphinx/connector/iceberg.md Outdated
@kaveti kaveti force-pushed the partitions-delete-observability branch from acd4f2d to 1b2094c Compare April 24, 2026 07:31
@kaveti
Copy link
Copy Markdown
Contributor Author

kaveti commented Apr 24, 2026

@chenjian2664 i have addressed your comments. thank you

@kaveti kaveti force-pushed the partitions-delete-observability branch from 1b2094c to 3bee45e Compare April 25, 2026 10:29
@kaveti kaveti force-pushed the partitions-delete-observability branch from 3bee45e to 93827a1 Compare April 29, 2026 09:50
@chenjian2664 chenjian2664 merged commit fb8abfc into trinodb:master May 1, 2026
46 checks passed
@github-actions github-actions Bot added this to the 481 milestone May 1, 2026
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented May 7, 2026

## Section
* Fixes https://github.com/trinodb/trino/issues/28910

Please follow https://trino.io/development/process#release-note-guidelines

@ebyhr ebyhr mentioned this pull request May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cla-signed delta-lake Delta Lake connector docs exasol Exasol connector faker Faker connector google-sheets Google Sheets connector hive Hive connector hudi Hudi connector iceberg Iceberg connector jdbc Relates to Trino JDBC driver lakehouse mongodb MongoDB connector pinot Pinot connector postgresql PostgreSQL connector redis Redis connector redshift Redshift connector ui Web UI

Development

Successfully merging this pull request may close these issues.

Iceberg: $partitions metadata table missing delete file metrics

6 participants