Skip to content

feat(node): add halt_height config field for coordinated chain upgrades#5334

Merged
moul merged 11 commits into
masterfrom
feat/halt-height
Apr 7, 2026
Merged

feat(node): add halt_height config field for coordinated chain upgrades#5334
moul merged 11 commits into
masterfrom
feat/halt-height

Conversation

@moul
Copy link
Copy Markdown
Member

@moul moul commented Mar 20, 2026

Summary

Adds a halt height mechanism for coordinated chain upgrades. The node stops after committing the specified block height.

How to set it

gnoland config set halt_height 352922

Or edit config.toml directly:

halt_height = 352922

How it works

  1. After finalizeCommit, consensus checks if height >= halt_height
  2. If so, calls osm.Kill() for a graceful shutdown
  3. The check is at the consensus level (not ABCI), following the same pattern as WithEarlyStart

Scope and future direction

This is a temporary coordination tool for the current chain upgrade. For the gnoland1 → gnoland-1 hard fork, validators set halt_height in their config, all nodes stop at the same block, then validators swap binary + config and restart.

After the upgrade, the proper mechanism will be GovDAO-based halting (#5368), which adds:

  • On-chain halt_height param set via governance proposal (no manual config needed)
  • halt_min_version — prevents old binaries from restarting after halt
  • Version guard at startup so validators can't accidentally run the wrong binary

Once #5368 is merged and active, halt_height in config becomes a node operator tool (e.g., "stop my node at height X for maintenance") rather than a coordination mechanism. Coordination should happen through governance.

No CLI flag — config only

Per @tbruyelle's suggestion, there's no --halt-height CLI flag. Config file is the single source of truth. This avoids the risk of validators missing the flag in duplicated command setups across their infrastructure.

Related

Contributors' checklist
  • Added new tests, or not needed, or not feasible
  • Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
  • Updated the official documentation or not needed
  • No breaking changes were made, or a BREAKING CHANGE: xxx message was included in the description
  • Added benchmarks label to the PR or not needed

Add a -halt-height flag that stops the node after committing a specific
block height. This enables coordinated chain upgrades where all
validators stop at the same height before upgrading the binary.

Usage:
  gnoland start -halt-height 50000

The node will process blocks normally until it commits block 50000,
then gracefully shut down. Set to 0 (default) to run indefinitely.

Implementation:
- Add haltHeight field to ConsensusState with SetHaltHeight setter
- Add WithHaltHeight node Option (same pattern as WithEarlyStart)
- Check height in finalizeCommit after block is committed
- Register -halt-height flag in gnoland start command
@Gno2D2
Copy link
Copy Markdown
Collaborator

Gno2D2 commented Mar 20, 2026

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

No automated checks match this pull request.

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@moul moul marked this pull request as draft March 20, 2026 10:14
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
gno.land/cmd/gnoland/start.go 0.00% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@moul
Copy link
Copy Markdown
Member Author

moul commented Mar 20, 2026

Note: tm2/pkg/sdk/baseapp.go already has haltHeight and haltTime fields with halt logic in Commit(), but they're never set from outside — there's no flag or setter to configure them.

This PR adds the halt mechanism at the consensus level (finalizeCommit in state.go) instead, which is arguably more appropriate since it's a node operation concern, not an app concern. The approach follows the same Option pattern as WithEarlyStart.

If preferred, I can alternatively just add a setter on BaseApp and wire the flag through the existing ABCI-level halt. Let me know which approach you prefer.

@moul
Copy link
Copy Markdown
Member Author

moul commented Mar 20, 2026

Historical context: baseapp.go had setHaltHeight() and setHaltTime() methods that were removed by @zivkovicmilos in PR #2229 (feat: add valset injection through r/sys/validators, Jul 2024). The fields remain but are unreachable. This PR re-adds the capability at the consensus level.

Comment thread tm2/pkg/bft/consensus/state_test.go Outdated
@aeddi
Copy link
Copy Markdown
Contributor

aeddi commented Mar 27, 2026

Tested locally and it works fine 👍

@tbruyelle
Copy link
Copy Markdown
Contributor

tbruyelle commented Mar 27, 2026

Suggestion: moves this setting into a config file (or allows to set it in a config file as well).

Rational: I believe that in a validator infrastructure setup, a config file change is more persistent that adding flag in a command. A command can be duplicated accross the infra setup, and a validator might miss to add the flag everywhere. If that happens for more than one third of the valset, the upgrade will take more time because a valid snapshot will have to be created and shared across those who haven't stopped their node at the expected height.

@moul moul marked this pull request as ready for review April 1, 2026 20:49
Copy link
Copy Markdown
Member

@thehowl thehowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there is an existing, half implemented mechanism for chain halts:

// block height at which to halt the chain and gracefully shutdown
haltHeight uint64
// minimum block time (in Unix seconds) at which to halt the chain and gracefully shutdown
haltTime uint64

gno/tm2/pkg/sdk/baseapp.go

Lines 906 to 917 in c59fc2e

switch {
case app.haltHeight > 0 && uint64(header.GetHeight()) >= app.haltHeight:
halt = true
case app.haltTime > 0 && header.GetTime().Unix() >= int64(app.haltTime):
halt = true
}
if halt {
app.halt()
// Note: State is not actually committed when halted. Logs from Tendermint

gno/tm2/pkg/sdk/baseapp.go

Lines 952 to 966 in c59fc2e

// halt attempts to gracefully shutdown the node via SIGINT and SIGTERM falling
// back on os.Exit if both fail.
func (app *BaseApp) halt() {
app.logger.Info("halting node per configuration", "height", app.haltHeight, "time", app.haltTime)
p, err := os.FindProcess(os.Getpid())
if err == nil {
// attempt cascading signals in case SIGINT fails (os dependent)
sigIntErr := p.Signal(syscall.SIGINT)
sigTermErr := p.Signal(syscall.SIGTERM)
if sigIntErr == nil || sigTermErr == nil {
return
}
}

The thing is that there is no setter or mechanism to actually modify these variables.

Overall this system looks to be slightly better? (Happens after finalization of a commit, not during a commit). But then we should remove the related code to the existing system, as duplicate

@thehowl thehowl changed the title feat(node): add --halt-height flag for coordinated chain upgrades feat(node): add halt_height config field for coordinated chain upgrades Apr 2, 2026
@tbruyelle
Copy link
Copy Markdown
Contributor

tbruyelle commented Apr 2, 2026

Overall this system looks to be slightly better? (Happens after finalization of a commit, not during a commit). But then we should remove the related code to the existing system, as duplicate

This implementation which comes from the SDK has a non-deterministic bug cosmos/cosmos-sdk#16638. Essentially the current impl uses unix signal to kill the process, but signals are asynchronous and some additional blocks can be processed after the halt height apparently.

The fix is here and should be backported IMO. cosmos/cosmos-sdk#16639
The halt check is moved to FinalizeBlock (not sure yet what is the corresponding function in gno), and just returns an error instead of using unix signals.

I've also noticed an important change related to haltTime, even though this PR is not about using this param, it is no longer compared to header time but to node time. Which is strange btw... Not sure if we should follow this as well.

Conclusion, I think it's better to use the existing code like @thehowl suggested, but it needs to be fixed.

Move halt logic from consensus state.go to the existing baseapp.go
mechanism. Fix the original bug where state was not committed before
halt. Replace signal-based halt with osm.Kill() to avoid async
non-determinism (cosmos/cosmos-sdk#16638).

halt_height is set via config.toml (gnoland config set halt_height N).
@moul moul force-pushed the feat/halt-height branch from 1ef7e61 to f77bcd9 Compare April 2, 2026 21:16
@moul
Copy link
Copy Markdown
Member Author

moul commented Apr 2, 2026

Refactored based on @thehowl and @tbruyelle's feedback:

Moved halt logic to baseapp, removed the duplicate consensus-level implementation (state.go), now using the existing baseapp.go mechanism.

Fixed commit-before-halt bug, the original baseapp code called halt() and returned before committing state (return abci.ResponseCommit{}). State was lost. Now the halt check runs after the full commit cycle (MultiWrite → cms.Commit → header save → check state reset).

Fixed signal non-determinism (cosmos/cosmos-sdk#16638), replaced syscall.SIGINT/SIGTERM signals (which are async and can let extra blocks through) with osm.Kill().

Added SetHaltHeight setter on BaseApp, wired from config.toml via gnoland config set halt_height N.

Comment thread tm2/pkg/sdk/baseapp.go Outdated
Comment thread tm2/pkg/sdk/baseapp.go Outdated
@moul moul merged commit b9471b7 into master Apr 7, 2026
116 checks passed
@moul moul deleted the feat/halt-height branch April 7, 2026 08:34
@github-project-automation github-project-automation Bot moved this from In Progress to Done in 💪 Bounties & Worx Apr 7, 2026
@github-project-automation github-project-automation Bot moved this from 📥 Inbox to ✅ Done in 😎 Manfred's Board Apr 7, 2026
moul added a commit that referenced this pull request Apr 7, 2026
Aligns with #5334's approach: GovDAO EndBlocker now sets the halt height
on BaseApp, which panics in BeginBlock of the next block. This is
deterministic (no async signals) and ensures the halted block is fully
committed.
moul added a commit that referenced this pull request Apr 7, 2026
…es (#5334)

## Summary

Adds a halt height mechanism for coordinated chain upgrades. The node
stops after committing the specified block height.

### How to set it

```bash
gnoland config set halt_height 352922
```

Or edit `config.toml` directly:
```toml
halt_height = 352922
```

### How it works

1. After `finalizeCommit`, consensus checks if `height >= halt_height`
2. If so, calls `osm.Kill()` for a graceful shutdown
3. The check is at the consensus level (not ABCI), following the same
pattern as `WithEarlyStart`

### Scope and future direction

This is a **temporary coordination tool** for the current chain upgrade.
For the gnoland1 → gnoland-1 hard fork, validators set `halt_height` in
their config, all nodes stop at the same block, then validators swap
binary + config and restart.

After the upgrade, the proper mechanism will be **GovDAO-based halting**
(#5368), which adds:
- On-chain `halt_height` param set via governance proposal (no manual
config needed)
- `halt_min_version` — prevents old binaries from restarting after halt
- Version guard at startup so validators can't accidentally run the
wrong binary

Once #5368 is merged and active, `halt_height` in config becomes a
**node operator tool** (e.g., "stop my node at height X for
maintenance") rather than a coordination mechanism. Coordination should
happen through governance.

### No CLI flag — config only

Per @tbruyelle's suggestion, there's no `--halt-height` CLI flag. Config
file is the single source of truth. This avoids the risk of validators
missing the flag in duplicated command setups across their
infrastructure.

### Related

- #5368 — GovDAO-based halt height + version guard (Phase 2, replaces
this for coordination)
- #5376 — gnoland-1 chain config
- #5411 — chain upgrade genesis replay

<details>
<summary>Contributors' checklist</summary>

- [x] Added new tests, or not needed, or not feasible
- [x] Provided an example (e.g. screenshot) to aid review or the PR is
self-explanatory
- [x] Updated the official documentation or not needed
- [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message
was included in the description
- [x] Added `benchmarks` label to the PR or not needed
</details>
aeddi pushed a commit that referenced this pull request Apr 9, 2026
…es (#5334)

## Summary

Adds a halt height mechanism for coordinated chain upgrades. The node
stops after committing the specified block height.

### How to set it

```bash
gnoland config set halt_height 352922
```

Or edit `config.toml` directly:
```toml
halt_height = 352922
```

### How it works

1. After `finalizeCommit`, consensus checks if `height >= halt_height`
2. If so, calls `osm.Kill()` for a graceful shutdown
3. The check is at the consensus level (not ABCI), following the same
pattern as `WithEarlyStart`

### Scope and future direction

This is a **temporary coordination tool** for the current chain upgrade.
For the gnoland1 → gnoland-1 hard fork, validators set `halt_height` in
their config, all nodes stop at the same block, then validators swap
binary + config and restart.

After the upgrade, the proper mechanism will be **GovDAO-based halting**
(#5368), which adds:
- On-chain `halt_height` param set via governance proposal (no manual
config needed)
- `halt_min_version` — prevents old binaries from restarting after halt
- Version guard at startup so validators can't accidentally run the
wrong binary

Once #5368 is merged and active, `halt_height` in config becomes a
**node operator tool** (e.g., "stop my node at height X for
maintenance") rather than a coordination mechanism. Coordination should
happen through governance.

### No CLI flag — config only

Per @tbruyelle's suggestion, there's no `--halt-height` CLI flag. Config
file is the single source of truth. This avoids the risk of validators
missing the flag in duplicated command setups across their
infrastructure.

### Related

- #5368 — GovDAO-based halt height + version guard (Phase 2, replaces
this for coordination)
- #5376 — gnoland-1 chain config
- #5411 — chain upgrade genesis replay

<details>
<summary>Contributors' checklist</summary>

- [x] Added new tests, or not needed, or not feasible
- [x] Provided an example (e.g. screenshot) to aid review or the PR is
self-explanatory
- [x] Updated the official documentation or not needed
- [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message
was included in the description
- [x] Added `benchmarks` label to the PR or not needed
</details>
moul added a commit that referenced this pull request Apr 27, 2026
## Summary

Adds a governance-driven mechanism for coordinated chain halts,
complementing the `halt_height` config field (PR #5334, merged).

### New GovDAO proposal: `r/sys/params.NewSetHaltRequest`

```go
// Submit a GovDAO proposal to halt the chain at block 100000,
// requiring binary version >= "chain/gnoland1.1" to resume.
pr := params.NewSetHaltRequest(100_000, "chain/gnoland1.1")
id := dao.MustCreateProposal(cross, pr)
```

This atomically sets two params in one proposal:
- **`node:p:halt_height`** (int64): Block to halt at. EndBlocker
propagates this to `BaseApp.SetHaltHeight()`, which panics in
`BeginBlock` of the next block — ensuring the halted block is fully
committed.
- **`node:p:halt_min_version`** (string): Minimum binary version
required to restart. If set, nodes refuse to start unless
`binary_version >= min_version`, preventing old binaries from
accidentally resuming a chain halted for an upgrade.

### Upgrade workflow

**Phase 1** (PR #5334, merged — `halt_height` config field): Operators
set `halt_height` in `config.toml` via `gnoland config set halt_height
N` to coordinate a clean stop at a specific block.

**Phase 2** (this PR — GovDAO halt): GovDAO votes on
`NewSetHaltRequest(height, minVersion)`. Once approved:
- All nodes halt at `halt_height` automatically (no config change
required)
- After restart, only nodes with binary version ≥ `halt_min_version` can
resume

### Implementation details

**Gno side** (`examples/gno.land/r/sys/params/halt.gno`):
- `NewSetHaltRequest(height int64, minVersion string)` — creates GovDAO
proposal
- Uses a single custom executor that sets both params atomically
- Use `height=0, minVersion=""` to cancel a scheduled halt

**Go side**:
- `nodeParamsKeeper` (new) — validates `node:p:halt_height` and
`node:p:halt_min_version` params
- `checkNodeStartupParams` (new) — called at startup after state load;
compares `version.Version` against `halt_min_version`
- `meetsMinVersion` / `parseGnolandVersion` — handles `chain/gnolandX.Y`
version format; falls back to exact match for other formats
- `EndBlocker` extended — reads `node:p:halt_height` and calls
`BaseApp.SetHaltHeight()` (deterministic panic in next BeginBlock, no
async signals)

## Tests

- `TestNodeParamsKeeperWillSetParam`: validates all param types and
error cases
- `TestMeetsMinVersion`: 12 test cases covering version comparison logic
- `TestParseGnolandVersion`: 7 test cases for version string parsing

## Dependencies

Built on top of PR #5334 (merged). Uses the same
`BaseApp.SetHaltHeight()` + `BeginBlock` panic mechanism for
deterministic halts.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related

Projects

Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants