feat(node): add halt_height config field for coordinated chain upgrades#5334
Conversation
Add a -halt-height flag that stops the node after committing a specific block height. This enables coordinated chain upgrades where all validators stop at the same height before upgrading the binary. Usage: gnoland start -halt-height 50000 The node will process blocks normally until it commits block 50000, then gracefully shut down. Set to 0 (default) to run indefinitely. Implementation: - Add haltHeight field to ConsensusState with SetHaltHeight setter - Add WithHaltHeight node Option (same pattern as WithEarlyStart) - Check height in finalizeCommit after block is committed - Register -halt-height flag in gnoland start command
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):No automated checks match this pull request. ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Note: This PR adds the halt mechanism at the consensus level ( If preferred, I can alternatively just add a setter on |
|
Historical context: |
|
Tested locally and it works fine 👍 |
|
Suggestion: moves this setting into a config file (or allows to set it in a config file as well). Rational: I believe that in a validator infrastructure setup, a config file change is more persistent that adding flag in a command. A command can be duplicated accross the infra setup, and a validator might miss to add the flag everywhere. If that happens for more than one third of the valset, the upgrade will take more time because a valid snapshot will have to be created and shared across those who haven't stopped their node at the expected height. |
thehowl
left a comment
There was a problem hiding this comment.
Note, there is an existing, half implemented mechanism for chain halts:
Lines 68 to 72 in c59fc2e
Lines 906 to 917 in c59fc2e
Lines 952 to 966 in c59fc2e
The thing is that there is no setter or mechanism to actually modify these variables.
Overall this system looks to be slightly better? (Happens after finalization of a commit, not during a commit). But then we should remove the related code to the existing system, as duplicate
This implementation which comes from the SDK has a non-deterministic bug cosmos/cosmos-sdk#16638. Essentially the current impl uses unix signal to kill the process, but signals are asynchronous and some additional blocks can be processed after the halt height apparently. The fix is here and should be backported IMO. cosmos/cosmos-sdk#16639 I've also noticed an important change related to Conclusion, I think it's better to use the existing code like @thehowl suggested, but it needs to be fixed. |
Move halt logic from consensus state.go to the existing baseapp.go mechanism. Fix the original bug where state was not committed before halt. Replace signal-based halt with osm.Kill() to avoid async non-determinism (cosmos/cosmos-sdk#16638). halt_height is set via config.toml (gnoland config set halt_height N).
|
Refactored based on @thehowl and @tbruyelle's feedback: Moved halt logic to baseapp, removed the duplicate consensus-level implementation ( Fixed commit-before-halt bug, the original baseapp code called Fixed signal non-determinism (cosmos/cosmos-sdk#16638), replaced Added |
Aligns with #5334's approach: GovDAO EndBlocker now sets the halt height on BaseApp, which panics in BeginBlock of the next block. This is deterministic (no async signals) and ensures the halted block is fully committed.
…es (#5334) ## Summary Adds a halt height mechanism for coordinated chain upgrades. The node stops after committing the specified block height. ### How to set it ```bash gnoland config set halt_height 352922 ``` Or edit `config.toml` directly: ```toml halt_height = 352922 ``` ### How it works 1. After `finalizeCommit`, consensus checks if `height >= halt_height` 2. If so, calls `osm.Kill()` for a graceful shutdown 3. The check is at the consensus level (not ABCI), following the same pattern as `WithEarlyStart` ### Scope and future direction This is a **temporary coordination tool** for the current chain upgrade. For the gnoland1 → gnoland-1 hard fork, validators set `halt_height` in their config, all nodes stop at the same block, then validators swap binary + config and restart. After the upgrade, the proper mechanism will be **GovDAO-based halting** (#5368), which adds: - On-chain `halt_height` param set via governance proposal (no manual config needed) - `halt_min_version` — prevents old binaries from restarting after halt - Version guard at startup so validators can't accidentally run the wrong binary Once #5368 is merged and active, `halt_height` in config becomes a **node operator tool** (e.g., "stop my node at height X for maintenance") rather than a coordination mechanism. Coordination should happen through governance. ### No CLI flag — config only Per @tbruyelle's suggestion, there's no `--halt-height` CLI flag. Config file is the single source of truth. This avoids the risk of validators missing the flag in duplicated command setups across their infrastructure. ### Related - #5368 — GovDAO-based halt height + version guard (Phase 2, replaces this for coordination) - #5376 — gnoland-1 chain config - #5411 — chain upgrade genesis replay <details> <summary>Contributors' checklist</summary> - [x] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added `benchmarks` label to the PR or not needed </details>
…es (#5334) ## Summary Adds a halt height mechanism for coordinated chain upgrades. The node stops after committing the specified block height. ### How to set it ```bash gnoland config set halt_height 352922 ``` Or edit `config.toml` directly: ```toml halt_height = 352922 ``` ### How it works 1. After `finalizeCommit`, consensus checks if `height >= halt_height` 2. If so, calls `osm.Kill()` for a graceful shutdown 3. The check is at the consensus level (not ABCI), following the same pattern as `WithEarlyStart` ### Scope and future direction This is a **temporary coordination tool** for the current chain upgrade. For the gnoland1 → gnoland-1 hard fork, validators set `halt_height` in their config, all nodes stop at the same block, then validators swap binary + config and restart. After the upgrade, the proper mechanism will be **GovDAO-based halting** (#5368), which adds: - On-chain `halt_height` param set via governance proposal (no manual config needed) - `halt_min_version` — prevents old binaries from restarting after halt - Version guard at startup so validators can't accidentally run the wrong binary Once #5368 is merged and active, `halt_height` in config becomes a **node operator tool** (e.g., "stop my node at height X for maintenance") rather than a coordination mechanism. Coordination should happen through governance. ### No CLI flag — config only Per @tbruyelle's suggestion, there's no `--halt-height` CLI flag. Config file is the single source of truth. This avoids the risk of validators missing the flag in duplicated command setups across their infrastructure. ### Related - #5368 — GovDAO-based halt height + version guard (Phase 2, replaces this for coordination) - #5376 — gnoland-1 chain config - #5411 — chain upgrade genesis replay <details> <summary>Contributors' checklist</summary> - [x] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added `benchmarks` label to the PR or not needed </details>
## Summary Adds a governance-driven mechanism for coordinated chain halts, complementing the `halt_height` config field (PR #5334, merged). ### New GovDAO proposal: `r/sys/params.NewSetHaltRequest` ```go // Submit a GovDAO proposal to halt the chain at block 100000, // requiring binary version >= "chain/gnoland1.1" to resume. pr := params.NewSetHaltRequest(100_000, "chain/gnoland1.1") id := dao.MustCreateProposal(cross, pr) ``` This atomically sets two params in one proposal: - **`node:p:halt_height`** (int64): Block to halt at. EndBlocker propagates this to `BaseApp.SetHaltHeight()`, which panics in `BeginBlock` of the next block — ensuring the halted block is fully committed. - **`node:p:halt_min_version`** (string): Minimum binary version required to restart. If set, nodes refuse to start unless `binary_version >= min_version`, preventing old binaries from accidentally resuming a chain halted for an upgrade. ### Upgrade workflow **Phase 1** (PR #5334, merged — `halt_height` config field): Operators set `halt_height` in `config.toml` via `gnoland config set halt_height N` to coordinate a clean stop at a specific block. **Phase 2** (this PR — GovDAO halt): GovDAO votes on `NewSetHaltRequest(height, minVersion)`. Once approved: - All nodes halt at `halt_height` automatically (no config change required) - After restart, only nodes with binary version ≥ `halt_min_version` can resume ### Implementation details **Gno side** (`examples/gno.land/r/sys/params/halt.gno`): - `NewSetHaltRequest(height int64, minVersion string)` — creates GovDAO proposal - Uses a single custom executor that sets both params atomically - Use `height=0, minVersion=""` to cancel a scheduled halt **Go side**: - `nodeParamsKeeper` (new) — validates `node:p:halt_height` and `node:p:halt_min_version` params - `checkNodeStartupParams` (new) — called at startup after state load; compares `version.Version` against `halt_min_version` - `meetsMinVersion` / `parseGnolandVersion` — handles `chain/gnolandX.Y` version format; falls back to exact match for other formats - `EndBlocker` extended — reads `node:p:halt_height` and calls `BaseApp.SetHaltHeight()` (deterministic panic in next BeginBlock, no async signals) ## Tests - `TestNodeParamsKeeperWillSetParam`: validates all param types and error cases - `TestMeetsMinVersion`: 12 test cases covering version comparison logic - `TestParseGnolandVersion`: 7 test cases for version string parsing ## Dependencies Built on top of PR #5334 (merged). Uses the same `BaseApp.SetHaltHeight()` + `BeginBlock` panic mechanism for deterministic halts. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Happy <yesreply@happy.engineering>
Summary
Adds a halt height mechanism for coordinated chain upgrades. The node stops after committing the specified block height.
How to set it
gnoland config set halt_height 352922Or edit
config.tomldirectly:How it works
finalizeCommit, consensus checks ifheight >= halt_heightosm.Kill()for a graceful shutdownWithEarlyStartScope and future direction
This is a temporary coordination tool for the current chain upgrade. For the gnoland1 → gnoland-1 hard fork, validators set
halt_heightin their config, all nodes stop at the same block, then validators swap binary + config and restart.After the upgrade, the proper mechanism will be GovDAO-based halting (#5368), which adds:
halt_heightparam set via governance proposal (no manual config needed)halt_min_version— prevents old binaries from restarting after haltOnce #5368 is merged and active,
halt_heightin config becomes a node operator tool (e.g., "stop my node at height X for maintenance") rather than a coordination mechanism. Coordination should happen through governance.No CLI flag — config only
Per @tbruyelle's suggestion, there's no
--halt-heightCLI flag. Config file is the single source of truth. This avoids the risk of validators missing the flag in duplicated command setups across their infrastructure.Related
Contributors' checklist
BREAKING CHANGE: xxxmessage was included in the descriptionbenchmarkslabel to the PR or not needed