Skip to content

Core: ensure snapshot timestamp is monotonically increasing for V4 tables#16293

Open
stevenzwu wants to merge 2 commits into
apache:mainfrom
stevenzwu:snapshot-timestamp-monotonic
Open

Core: ensure snapshot timestamp is monotonically increasing for V4 tables#16293
stevenzwu wants to merge 2 commits into
apache:mainfrom
stevenzwu:snapshot-timestamp-monotonic

Conversation

@stevenzwu
Copy link
Copy Markdown
Contributor

Summary

For format version 4 and above, this change makes SnapshotProducer produce snapshot timestamps that are strictly greater than the parent snapshot's timestamp-ms, using a Lamport-clock style fast-forward when the wall clock has drifted backward.

  • Adds TableMetadata.MIN_FORMAT_VERSION_MONOTONIC_TIMESTAMPS = 4.
  • In SnapshotProducer.commit(), replaces System.currentTimeMillis() with snapshotTimestampMillis(parentSnapshot). The new helper returns clock.millis() for V1\u2013V3 (and for V4 root snapshots with no parent), and max(clock.millis(), parentSnapshot.timestampMillis() + 1) for V4+ with a parent on the target branch. This preserves wall-clock timestamps in the steady state and only fast-forwards by the minimum amount needed when the clock has drifted backward.
  • Adds an @VisibleForTesting setClock(Clock) hook so the Lamport behavior can be exercised deterministically.

This is a behavior change for V4 only. V1\u2013V3 commits continue to use the wall clock unchanged.

Why

The format-v4 row-timestamp feature exposes each manifest entry's commit_timestamp_ms as the _last_updated_timestamp_ms metadata column. Time-travel and "latest update" queries on that column become incoherent if a backward jump in the writer's wall clock produces snapshots whose timestamp-ms does not strictly increase with sequence number. Enforcing monotonicity at write time keeps the per-row "last updated" timestamp consistent with snapshot order.

The change is also defensive against clock skew between writers in distributed environments: a stale clock that briefly reports a past time no longer rewrites history.

Test plan

  • ./gradlew :iceberg-core:test --tests org.apache.iceberg.TestSnapshotProducer (29 tests, 6 skipped for pre-V4 by assumeThat, 0 failures)
  • ./gradlew :iceberg-core:spotlessCheck
  • CI

New tests in TestSnapshotProducer:

  • testSnapshotTimestampsAreMonotonicallyIncreasing \u2014 three consecutive V4 appends produce strictly increasing timestamp-ms.
  • testV4LamportClockFastForwardsDriftedClock \u2014 with the producer's Clock pinned 10s in the past, the second V4 snapshot's timestamp-ms equals firstTs + 1, confirming the fast-forward branch is exercised.

Related

A separate PR will propose the corresponding table spec change documenting the monotonic-timestamp-ms requirement for V4.

Made with Cursor

… tables.

Made-with: Cursor
Model: claude-4.6-opus-high-thinking
Co-authored-by: Cursor <cursoragent@cursor.com>
…arget branch

Pins that SnapshotProducer.snapshotTimestampMillis uses the parent on
the target branch (via SnapshotUtil.latestSnapshot(base, targetBranch))
rather than the table's globally newest snapshot.

The test commits to a separate branch with a clock pinned far in the
future, then commits to main with a clock pinned just slightly past
main's head. Main's new timestamp must equal the pinned wall-clock
value (no fast-forward) and remain well below the sibling branch's
head timestamp.

Without this test, a regression that read base.currentSnapshot() (or
any max-over-branches) for the parent would silently pass the existing
single-branch monotonicity tests.

Made-with: Cursor
Model: claude-4.7-opus
@amogh-jahagirdar amogh-jahagirdar self-requested a review May 12, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant