Skip to content

fix(consensus): create CUP shares earlier when halting#10347

Draft
pierugo-dfinity wants to merge 2 commits into
masterfrom
pierugo/consensus/skip-if-check-on-upgrades
Draft

fix(consensus): create CUP shares earlier when halting#10347
pierugo-dfinity wants to merge 2 commits into
masterfrom
pierugo/consensus/skip-if-check-on-upgrades

Conversation

@pierugo-dfinity
Copy link
Copy Markdown
Contributor

@pierugo-dfinity pierugo-dfinity commented May 29, 2026

Imagine a subnet is supposed to upgrade at summary height X and checkpointing is particularly slow. The latest certified height would be stuck at X - 1 for a long time, potentially up to reaching the hard bound between notarization and certification heights. In that case, even when the certified height increases to X after checkpointing is done, the latest block proposal still has a certified height of X - 1 (because the blockmaker is always 1 height ahead of the notary) and the subnet reaches the bound again. Because the subnet is upgrading, the certified height will not increase, and because we start creating CUP shares only when the finalized tip's certified height reaches X, no CUP is created and the subnet stalls.

This PR fixes this issue by waiting for the finalized tip's certified height to reach X before creating a CUP only when the subnet is not halting. The reasoning behind waiting for that condition in the first place is to make sure that a CUP is truly the only necessary resource for a node to catch-up. Indeed, if the condition did not hold, one would potentially miss some states below X that would be necessary to validate blocks above X but pointing to a certified state below X. In case of upgrades though (or more generally, when halting at a summary height), we anyways create empty blocks past it, meaning that we do not need past states anyways to validate them. Moreover, since we do not deliver blocks to message routing in these cases, we can afford to "restart" the blockchain at X after restarting (in case of upgrades) or recovering the subnet (in case of halt_at_cup_height). This means that in those cases, the blocks following X anyways do not matter.

With this change, it is now also sufficient to wait for the summary block at height X to be finalized instead of waiting for the finalized tip's certified height to reach X before stopping to create empty blocks, which is the second functional change of this PR.

The PR also includes a regression test that simulates exactly the scenario described above: we artificially "freeze" the latest certified height to X - 1, the subnet reaches the hard bound at X + 69, at which point we "unfreeze" the certified height. The test ensures that we are still able to create a CUP. The test fails before the fix and passes after it.

@github-actions github-actions Bot added the fix label May 29, 2026
@pierugo-dfinity pierugo-dfinity added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label May 29, 2026
@pierugo-dfinity pierugo-dfinity force-pushed the pierugo/consensus/skip-if-check-on-upgrades branch 6 times, most recently from 71bef2d to 7fd2c80 Compare June 1, 2026 14:22
@pierugo-dfinity pierugo-dfinity force-pushed the pierugo/consensus/skip-if-check-on-upgrades branch from 7fd2c80 to 9e5197e Compare June 1, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI_ALL_BAZEL_TARGETS Runs all bazel targets fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant