rawdb: add freezer safety margin to prevent data loss on corruption#637
Draft
joshuacolvin0 wants to merge 1 commit into
Draft
rawdb: add freezer safety margin to prevent data loss on corruption#637joshuacolvin0 wants to merge 1 commit into
joshuacolvin0 wants to merge 1 commit into
Conversation
0d92bfa to
561c093
Compare
5dcf2b0 to
62ab752
Compare
…hutdown After an unclean shutdown, repair() may truncate the freezer head to restore cross-table consistency. Previously, blocks were deleted from the key-value store immediately after freezing, so truncated blocks could end up missing from both stores — making the node unable to start (especially for L2 nodes that cannot re-sync pruned blocks from peers). Introduce a safety margin (freezerCleanupMargin = freezerBatchLimit) that retains the most recently frozen blocks in the key-value store. Since freezeRange reads via nofreezedb (which bypasses the ancient store), retained blocks can be re-frozen after repair() truncation. Key changes: - Add cleanupMargin field on chainFreezer with persisted cleanup tail (freezerCleanupTailKey) so progress resumes across restarts - Replace immediate post-freeze deletion with incremental cleanup over [cleanupStart, cleanupLimit) using Has()+Get() to distinguish missing keys from I/O errors, with backoff on failure - Add startup validation in Open(): detect unrecoverable data gaps where the freezer has been truncated below the cleanup tail - Handle upgrade path (skip-ahead when no tail but frozen > FullImmutabilityThreshold) and fresh installs (clean from block 1) - Cap per-cycle cleanup to freezerBatchLimit to prevent stalling - Bound dangling side chain chase to freezerBatchLimit iterations - Add ReadFreezerCleanupTail/WriteFreezerCleanupTail accessors and a strict variant for startup/runtime error propagation - Surface cleanup tail in ReadChainMetadata diagnostics - Add comprehensive test suite (21 tests) covering margin behavior, crash recovery, side chain cleanup, boundary conditions, corruption detection, upgrade path, and regression guard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
62ab752 to
07c22fc
Compare
| @@ -220,14 +246,90 @@ func (f *chainFreezer) freeze(db ethdb.KeyValueStore) { | |||
| if err := f.SyncAncient(); err != nil { | |||
Contributor
There was a problem hiding this comment.
I can totally understand the rationale for adding this margin.
Something I don't understand is:
// After an
// unclean shutdown, repair() may truncate the freezer head to restore
// cross-table consistency.
If a chain segment has been moved to the freezer, the freezer is explicitly synced before the corresponding items are deleted from the key-value store. Specifically, once a chain segment is migrated, one of two conditions applies:
- It has been fully synced, with all tables aligned via f.SyncAncient(), or
- It has not yet been properly flushed to the freezer, in which case it can be reverted on the next startup due to an unclean shutdown.
In either scenario, it is guaranteed that the chain segment exists in at least one location, either in the freezer or in the key-value store.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pulled in by OffchainLabs/nitro#4506
related to NIT-4663
After an unclean shutdown, repair() may truncate the freezer head to
restore cross-table consistency. Previously, blocks were deleted from
the key-value store immediately after freezing, so truncated blocks
could end up missing from both stores — making the node unable to
start (especially for L2 nodes that cannot re-sync pruned blocks from
peers).
Introduce a safety margin (freezerCleanupMargin = freezerBatchLimit)
that retains the most recently frozen blocks in the key-value store.
Since freezeRange reads via nofreezedb (which bypasses the ancient
store), retained blocks can be re-frozen after repair() truncation.
Key changes:
(freezerCleanupTailKey) so progress resumes across restarts
[cleanupStart, cleanupLimit) using Has()+Get() to distinguish missing
keys from I/O errors, with backoff on failure
where the freezer has been truncated below the cleanup tail
FullImmutabilityThreshold) and fresh installs (clean from block 1)
strict variant for startup/runtime error propagation
crash recovery, side chain cleanup, boundary conditions, corruption
detection, upgrade path, and regression guard
Disk overhead: ~30K blocks duplicated temporarily (~30-600 MB).
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com