Skip to content

Read metadata and protocol information from Delta checksum files#28381

Merged
raunaqmorarka merged 3 commits into
trinodb:masterfrom
adam-richardson-openai:dev/as3richa/delta-checksum-upstream
Apr 30, 2026
Merged

Read metadata and protocol information from Delta checksum files#28381
raunaqmorarka merged 3 commits into
trinodb:masterfrom
adam-richardson-openai:dev/as3richa/delta-checksum-upstream

Conversation

@adam-richardson-openai
Copy link
Copy Markdown
Contributor

@adam-richardson-openai adam-richardson-openai commented Feb 20, 2026

Description

Read metadata and protocol information from Delta checksum files, when configured and where available

Compliant writers of Delta tables may optionally write "checksum" files alongside each commit. These checksum files contain a variety of (optional) useful information, including the Delta table metadata and protocol information. See https://github.com/delta-io/delta/blob/488c916931ca9d210f4cadd2d5520e0274d26b04/PROTOCOL.md#version-checksum-file for the full checksum file spec

Trino needs to load the table metadata and protocol information at planning time. Today, this is done by identifying and loading the latest table checkpoint, as well as replaying all intervening commits up to the latest. This can be extremely slow and expensive, as checkpoints can be enormous and there may be many intervening commits

Instead, we can simply determine the latest commit in the table, load the corresponding checksum file (if it exists), and parse the metadata and protocol information (if available in the checksum file). This takes only a single listing operation and a single load of a small JSON file, as opposed to potentially loading many files in the Delta log based approach (some of which may be extremely large depending on the size and configuration of the table)

The listing starts from _last_checkpoint (when present) using the listFilesStartingFrom filesystem primitive, so the scan only covers commits since the last checkpoint rather than the entire _delta_log.

If there is no checksum file for the latest eligible commit in the table, or if the checksum file doesn't capture both the metadata and the protocol information for the table, we fall back to the existing approach of scanning the Delta log. (Checksum files are considered optional under the Delta spec, as are all fields therein)

This new behavior is gated behind a session property, load_metadata_from_checksum_file, which in turn defaults to the value of the delta.load-metadata-from-checksum-file configuration. The config value itself defaults to true, since we expect this change to be a straightforward performance optimization in the overwhelming majority of cases

Repeated queries against the same table version reuse the parsed descriptor via a cross-query cache on TransactionLogAccess keyed by (schema.table, location, version), so the .crc parse is skipped on subsequent calls. The cache is bounded to 1000 entries since each parsed descriptor is small. This matches the hot-cache behaviour of the transaction-log path, which already benefits from the TableSnapshot cache.

This optimization is particularly effective for tables using the v1 checkpoint spec, since v1 checkpoints files may be very large and heavy

We drove internal performance testing, using queries like

SELECT 1 FROM <table> LIMIT 1

where <table> is a large table using the v1 checkpoint spec. We observed that time spent in analysis fell from 10s on average to well under 500ms

Additional context and related issues

Builds on #28549, which added the listFilesStartingFrom primitive to TrinoFileSystem and is now in master. That primitive lets the checksum lookup scan only the tail of _delta_log after the most recent checkpoint, which is what makes this optimization worthwhile on tables with large logs.

The prep commit Tolerate path normalization in EmulatedListFilesStartingFromIterator was added after observing the following failure when running the Databricks-credentialed ADLS test suite (TestDeltaLakeAdlsStorage.testQuery):

java.lang.IllegalStateException: Expected listed file to start with directory path 'tpch-tiny-<uuid>//customer/_delta_log/': abfs://trino-ci@devcicdhierarchical.dfs.core.windows.net/tpch-tiny-<uuid>/customer/_delta_log/00000000000000000020.checkpoint.parquet
	at com.google.common.base.Preconditions.checkState(Preconditions.java:888)
	at io.trino.filesystem.EmulatedListFilesStartingFromIterator.loadNextEntry(EmulatedListFilesStartingFromIterator.java:72)
	at io.trino.filesystem.EmulatedListFilesStartingFromIterator.hasNext(EmulatedListFilesStartingFromIterator.java:44)
	at io.trino.plugin.deltalake.transactionlog.TransactionLogParser.findLatestCommitVersion(TransactionLogParser.java:345)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.resolveLatestCommitVersion(DeltaLakeMetadata.java:844)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.lambda$loadDescriptorFromChecksum$0(DeltaLakeMetadata.java:812)
	at java.base/java.util.Optional.orElseGet(Optional.java:364)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.loadDescriptorFromChecksum(DeltaLakeMetadata.java:812)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.loadDescriptor(DeltaLakeMetadata.java:796)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.getTableHandle(DeltaLakeMetadata.java:742)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.getTableHandle(DeltaLakeMetadata.java:385)

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Delta Lake
  * Improve query planning latency by reading metadata and protocol information from Delta checksum files. Can be disabled via the `delta.load-metadata-from-checksum-file` configuration property or
  `load_metadata_from_checksum_file` session property. ({issue}`28381`)

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Feb 20, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions Bot added the delta-lake Delta Lake connector label Feb 20, 2026
@adam-richardson-openai
Copy link
Copy Markdown
Contributor Author

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

I emailed my signed CLA to cla@trino.io moments ago

@adam-richardson-openai adam-richardson-openai force-pushed the dev/as3richa/delta-checksum-upstream branch from 08bb9c5 to fd9787c Compare February 20, 2026 02:56
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Feb 20, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@adam-richardson-openai adam-richardson-openai force-pushed the dev/as3richa/delta-checksum-upstream branch from fd9787c to de2bcd7 Compare February 20, 2026 06:10
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Feb 20, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for reading Delta table metadata and protocol information from checksum files (.crc files) when available, providing a significant performance optimization for tables with large v1 checkpoints. The feature is controlled by a new configuration property delta.load_metadata_from_checksum_file (defaulting to true) and corresponding session property load_metadata_from_checksum_file.

Changes:

  • Added support for reading metadata and protocol information from Delta checksum files, falling back gracefully to transaction log scanning when checksum files are unavailable or incomplete
  • Introduced configuration and session properties to control the new checksum file loading behavior
  • Enhanced test coverage with comprehensive unit and integration tests for checksum file parsing, fallback behavior, and error handling

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
DeltaLakeConfig.java Added new configuration property load_metadata_from_checksum_file (defaults to true)
DeltaLakeSessionProperties.java Added corresponding session property for checksum metadata loading
DeltaLakeVersionChecksum.java New class representing the structure of Delta checksum files with metadata and protocol entries
TransactionLogParser.java Added methods getLatestCommitVersion and readVersionChecksumFile to support checksum file operations
DeltaLakeMetadata.java Refactored getTableHandle to attempt loading from checksum files first before falling back to transaction log
DeltaLakeTableMetadataScheduler.java Refactored isSameTransactionVersion method to accept version directly, supporting both snapshot and version checks
TestTransactionLogParser.java Added comprehensive tests for checksum file reading and parsing edge cases
TestDeltaLakeMetadata.java Added integration tests for checksum loading, fallback behavior, and error handling scenarios
TestDeltaLakeConfig.java Updated test to validate default value of new configuration property
TestDeltaLakeFileOperations.java Updated file operation tracking to account for checksum file reads
TestDeltaLakeBasic.java Updated error message assertions to accommodate new error messages from checksum loading path
TestDeltaLakeAlluxio*.java Updated Alluxio cache operation tests to include checksum file interactions
TestDeltaLakeActiveFilesCache.java Updated to disable checksum loading for reproducing specific cache staleness issues

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adam-richardson-openai
Copy link
Copy Markdown
Contributor Author

Just for clarity/posterity -- I force-pushed this branch a couple times with additional changes to address test failures, to avoid trashing the commit history and since there had been no ongoing review. Now that reviewers are engaged, I'll put subsequent fixes in their own commits!

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Feb 20, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@adam-richardson-openai
Copy link
Copy Markdown
Contributor Author

Based on Copilot's feedback, I went from snake_case to kebab-case for the configuration property. I have updated the PR description to reflect this change, but have not yet updated the original commit to avoid thrashing the history. The commit message must be updated prior to merge

Copy link
Copy Markdown
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great observation @adam-richardson-openai

Looking forward for you to address the comments

@cla-bot cla-bot Bot added the cla-signed label Feb 21, 2026
@github-actions github-actions Bot added the docs label Feb 22, 2026
@adam-richardson-openai adam-richardson-openai force-pushed the dev/as3richa/delta-checksum-upstream branch 2 times, most recently from 5e4421e to aab2cf1 Compare February 23, 2026 01:03
@adam-richardson-openai adam-richardson-openai changed the title Read metadata and protocol information from Delta checksum files, when configured and where available Read metadata and protocol information from Delta checksum files Feb 23, 2026
@adam-richardson-openai
Copy link
Copy Markdown
Contributor Author

I substantially reworked the new tests in aab2cf1. Summary:

  • I eliminated all cases of mocking or writing of synthetic files, in favor of new fixtures generated using Spark
  • I added several new tests relating to fallback logic to TestDeltaLakeFileOperations. These mostly replace old tests in TestDeltaLakeMetadata that were excessively mock-heavy and that have since been deleted
  • I preserved a few basic smoketests/sanity-check shaped tests in TestDeltaLakeMetadata, using fixtures rather than munging the table metadata in-band. While I think these tests are useful, I want to flag that they required some additional complexity to support referencing fixture tables in the context of the existing suite, so I'm also okay to remove them if preferred

@raunaqmorarka
Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

}

@Test
void testListFilesStartingFromHierarchicalLocationNormalization()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test succeeds also without the productive code changes on EmulatedListFilesStartingFromIterator.java
I would have assumed that it was suposed to fail.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified by reverting the iterator change locally: the test fails with IllegalStateException: Expected listed file to start with directory path 'dir//sub/_delta_log/': abfs://container@account.dfs.core.windows.net/dir/sub/_delta_log/00000000000000000000.json.

@raunaqmorarka
Copy link
Copy Markdown
Member

/test-with-secrets sha=a3512a0497e6f4e2745d80e36eba30a7d8ba3727

@raunaqmorarka
Copy link
Copy Markdown
Member

/test-with-secrets sha=a3512a0497e94828bb318e6880b1d0ab34ef5c3b

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/25065483517

@raunaqmorarka
Copy link
Copy Markdown
Member

@electrum @wendigo can you pls take a look at the first commit here ?
There was a test failure from usage of listStartingFrom on ADLS in this PR, so we need to address that as part of landing this PR

@raunaqmorarka
Copy link
Copy Markdown
Member

/test-with-secrets sha=15a408446e015f4394da0776d8f6517c5904143f

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/25074846482

@findinpath
Copy link
Copy Markdown
Contributor

Error:  Failures: 
Error:    TestDeltaLakeSharedGlueMetastoreWithTableRedirections>BaseDeltaLakeSharedMetastoreWithTableRedirectionsTest.testShowTables:72 
Multiple Failures (1 failure)
-- failure 1 --
[Rows for query [SHOW TABLES FROM hive_with_redirections.test_shared_schema_tww1oaqyih]] 
Expecting actual:
  (delta_table), (hive_table), (unsupported_types)
to contain exactly in any order:
  [(hive_table), (delta_table)]
but the following elements were unexpected:
  (unsupported_types)
at QueryAssertions$ResultAssert.lambda$matches$0(QueryAssertions$ResultAssert.java:741)
[INFO] 
Error:  Tests run: 340, Failures: 1, Errors: 0, Skipped: 2

https://github.com/trinodb/trino/actions/runs/25074846482/job/73464777568

is this related to your changes?

@raunaqmorarka
Copy link
Copy Markdown
Member

Error:  Failures: 
Error:    TestDeltaLakeSharedGlueMetastoreWithTableRedirections>BaseDeltaLakeSharedMetastoreWithTableRedirectionsTest.testShowTables:72 
Multiple Failures (1 failure)
-- failure 1 --
[Rows for query [SHOW TABLES FROM hive_with_redirections.test_shared_schema_tww1oaqyih]] 
Expecting actual:
  (delta_table), (hive_table), (unsupported_types)
to contain exactly in any order:
  [(hive_table), (delta_table)]
but the following elements were unexpected:
  (unsupported_types)
at QueryAssertions$ResultAssert.lambda$matches$0(QueryAssertions$ResultAssert.java:741)
[INFO] 
Error:  Tests run: 340, Failures: 1, Errors: 0, Skipped: 2

https://github.com/trinodb/trino/actions/runs/25074846482/job/73464777568

is this related to your changes?

Should be unrelated, I'll trigger re-run to confirm

@raunaqmorarka
Copy link
Copy Markdown
Member

/test-with-secrets sha=15a408446e015f4394da0776d8f6517c5904143f

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/25130600688

@raunaqmorarka
Copy link
Copy Markdown
Member

/test-with-secrets sha=b23837e2232f5e815bdd30b113508049b53b15af

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/25153395021

Copy link
Copy Markdown
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort @adam-richardson-openai & @raunaqmorarka 🎉

Thank you for providing Trino with this exciting new capability. ❤️

raunaqmorarka and others added 3 commits April 30, 2026 15:39
The iterator's strict `entryPath.startsWith(locationPath)` invariant
breaks when the underlying file system canonicalizes runs of slashes:
listing a directory location ending in `//` returns entries with a
single slash and the check fires with `IllegalStateException`. ADLS
Gen2 (hierarchical), Java NIO's `LocalFileSystem`, and
`AlluxioFileSystem` canonicalize; Hadoop's `HdfsFileSystem` and
`S3FileSystem` preserve `//` as a distinct path component.

Try the original prefix first (preserves blob-store keys with literal
`//` components), fall back to the slash-collapsed form, and compute
`entryTail` from whichever matched.

Surfaced by `TestDeltaLakeAdlsStorage.testQuery`.
Compliant Delta writers may emit optional checksum files alongside
commits containing metadata and protocol information. Instead of
loading the latest checkpoint and replaying intervening commits (which
can be expensive, especially for large v1 checkpoints), Trino can read
the latest commit's checksum file to obtain this information with a
single listing and small JSON read. Ref.
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#version-checksum-file

If the checksum file is missing or does not contain both metadata and
protocol, we fall back to the existing Delta log scanning approach.

Behavior is gated by session property load_metadata_from_checksum_file
(defaulting to config delta.load_metadata_from_checksum_file, which
defaults to true). Internal testing reduced analysis time for large
v1-checkpoint tables from ~10s to <500ms.

Within a transaction, the resolved commit version and _last_checkpoint
contents are reused across loadDescriptor and getSnapshot calls so the
descriptor and snapshot paths don't each re-read _last_checkpoint.

Co-authored-by: Eric Hwang <eh@openai.com>
Co-authored-by: Fred Liu <fredliu@openai.com>
The checksum fast path in getTableHandle bypasses the
TableSnapshot cache and therefore re-parses the .crc file on
every query for an unchanged table. Add a cross-query cache on
TransactionLogAccess keyed by (schema.table, location, version),
populated by the checksum loader, so repeated queries reuse the
parsed metadata and protocol.

Cache Optional<DeltaLakeTableDescriptor> so a missing or
malformed checksum is remembered too; subsequent calls fall
through to the transaction-log path without re-reading the .crc.
The cache is bounded to 1000 entries (descriptors are small) and
invalidated alongside tableSnapshots in flushCache and
invalidateCache.
@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Apr 30, 2026

First commit LGTM

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Apr 30, 2026

This is exciting improvement! Thanks @adam-richardson-openai

@ebyhr ebyhr mentioned this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector docs stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.

Development

Successfully merging this pull request may close these issues.

7 participants