fix(consensus): create CUP shares earlier when halting#10347
Draft
pierugo-dfinity wants to merge 2 commits into
Draft
fix(consensus): create CUP shares earlier when halting#10347pierugo-dfinity wants to merge 2 commits into
pierugo-dfinity wants to merge 2 commits into
Conversation
71bef2d to
7fd2c80
Compare
7fd2c80 to
9e5197e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Imagine a subnet is supposed to upgrade at summary height
Xand checkpointing is particularly slow. The latest certified height would be stuck atX - 1for a long time, potentially up to reaching the hard bound between notarization and certification heights. In that case, even when the certified height increases toXafter checkpointing is done, the latest block proposal still has a certified height ofX - 1(because the blockmaker is always 1 height ahead of the notary) and the subnet reaches the bound again. Because the subnet is upgrading, the certified height will not increase, and because we start creating CUP shares only when the finalized tip's certified height reachesX, no CUP is created and the subnet stalls.This PR fixes this issue by waiting for the finalized tip's certified height to reach
Xbefore creating a CUP only when the subnet is not halting. The reasoning behind waiting for that condition in the first place is to make sure that a CUP is truly the only necessary resource for a node to catch-up. Indeed, if the condition did not hold, one would potentially miss some states belowXthat would be necessary to validate blocks aboveXbut pointing to a certified state belowX. In case of upgrades though (or more generally, when halting at a summary height), we anyways create empty blocks past it, meaning that we do not need past states anyways to validate them. Moreover, since we do not deliver blocks to message routing in these cases, we can afford to "restart" the blockchain atXafter restarting (in case of upgrades) or recovering the subnet (in case ofhalt_at_cup_height). This means that in those cases, the blocks followingXanyways do not matter.With this change, it is now also sufficient to wait for the summary block at height
Xto be finalized instead of waiting for the finalized tip's certified height to reachXbefore stopping to create empty blocks, which is the second functional change of this PR.The PR also includes a regression test that simulates exactly the scenario described above: we artificially "freeze" the latest certified height to
X - 1, the subnet reaches the hard bound atX + 69, at which point we "unfreeze" the certified height. The test ensures that we are still able to create a CUP. The test fails before the fix and passes after it.