Skip to content

feat: hardfork-replay improvements (BaseApp InitialHeight + PastChainIDs genesis-mode + --patch-realm)#5540

Merged
moul merged 3 commits into
gnolang:feat/genesis-replay-upgrade3from
moul:feat/hardfork-replay-improvements
Apr 17, 2026
Merged

feat: hardfork-replay improvements (BaseApp InitialHeight + PastChainIDs genesis-mode + --patch-realm)#5540
moul merged 3 commits into
gnolang:feat/genesis-replay-upgrade3from
moul:feat/hardfork-replay-improvements

Conversation

@moul
Copy link
Copy Markdown
Member

@moul moul commented Apr 17, 2026

Three tightly-related improvements that fell out of end-to-end-testing the hardfork-replay mechanism on the full gnoland1 chain (see #5486). Each is a standalone commit; happy to split into separate PRs if reviewers prefer.

1. `fix(tm2/sdk): BaseApp validateHeight + Info handle InitialHeight > 1`

Two separate issues hit chains whose genesis sets `InitialHeight > 1`:

  • `validateHeight` compared `req.Header.Height` against the multistore version counter (which auto-increments from 0). With `InitialHeight > 1` the store version lags block height — first block arrives at e.g. 101 while store is at version 0, second at 102 while store is at 1 — so `expected 2, got 102` panicked. Fix: when the store version lags, accept the jump as long as height is monotonic.
  • `Info()` returned the multistore version as `LastBlockHeight`. On restart the handshaker saw `appHeight=1` but `storeHeight=102` and tried to replay from height 2. Fix: when the persisted header records a higher block height, return that instead.

2. `feat(gnoland): genesis-mode txs use PastChainIDs[0] for sig verify`

During a hardfork replay, genesis-mode txs (metadata == nil or BlockHeight == 0) were originally signed with the source chain's chain-id. Historical txs (BlockHeight > 0) already get `PastChainIDs`-based chain-id override in `loadAppState`; this extends the same treatment to genesis-mode txs by using the first `PastChainIDs` entry when a hardfork is in progress.

In practice this still needs `--skip-genesis-sig-verification` for gnogenesis-produced addpkg txs (where `msg.Creator ≠ signing key` — the pubkey-address check rejects those regardless of chain-id). But for genesis-mode txs where the signer IS the creator, this makes the signature verify against the correct chain-id without any skip flag.

3. `feat(hardfork): --patch-realm flag`

Repeatable `--patch-realm PKGPATH=SRCDIR` flag on `hardfork genesis`. Rewrites the genesis-mode addpkg tx for `PKGPATH` in-place, replacing `Package.Files` with the `*.gno` + `gnomod.toml` files from `SRCDIR`. Source genesis on disk stays untouched — patch lives only in the in-memory `GnoGenesisState` used for the output.

Motivation: you cannot re-addpkg to the same path post-deploy (unauthorized), and you cannot add a new `.gno` file to an existing realm via a call, so the only way to land a code change on an existing realm during a hardfork is to rewrite the addpkg tx that originally deployed it.

Combined with #5368 (which adds `halt.gno` to `r/sys/params`), the hf-glue testbed boots a fork of gnoland1 where `r/sys/params` ships the new `NewSetHaltRequest` code out of the box:

```
$ curl ... vm/qfile gno.land/r/sys/params
→ fee_collector.gno, gnomod.toml, halt.gno, params.gno, unlock.gno
```

End-to-end validation

All three land together in #5486 (hf-glue testbed). Running `make fetch && make init && make up` against `rpc.gno.land` / halt @ 704052 produces a 192 MB hardfork genesis that replays with 0 / 2715 tx failures and boots a live `gnoland-1` node with `r/sys/params` carrying the patched source.

Dependencies:

AI disclosure

Developed with assistance from Claude Code.

moul added 3 commits April 17, 2026 18:21
Two separate issues hit by chains whose genesis sets InitialHeight > 1
(the hardfork-replay use case from gnolang#5511):

1. validateHeight compared req.Header.Height against the multistore
   version counter (which auto-increments from 0). With InitialHeight
   > 1 the counter lags the block height — the first block arrives at
   (e.g.) 101 while the store is at version 0, then the second block
   arrives at 102 while the store is at version 1, so the check
   "expected 2, got 102" panicked. Now: when the store version lags
   the block height, accept the jump as long as height is monotonic.

2. Info() returned the multistore version as LastBlockHeight. On
   restart the handshaker saw appHeight=1 (store version) but
   storeHeight=102 (real blocks) and tried to replay missing blocks.
   Now: when the persisted header records a higher block height,
   return that instead.

These fixes are exercised by the hardfork-replay flow but help any
chain that sets InitialHeight > 1.
During a hardfork replay, genesis-mode txs (metadata == nil or
BlockHeight == 0) were originally signed with the source chain's
chain-id — not the new one. Historical txs (BlockHeight > 0) already
get PastChainIDs-based chain-id override in loadAppState; this
extends the same treatment to genesis-mode txs by using the first
PastChainIDs entry when a hardfork is in progress.

In practice this still needs --skip-genesis-sig-verification for
gnogenesis-produced addpkg txs (where msg.Creator ≠ the signing key —
the pubkey-address check rejects them regardless of chain-id). But
for genesis-mode txs where the signer IS the creator, this makes the
signature verify against the correct chain-id without any skip flag.

Tested end-to-end on the gnoland1 hardfork testbed in gnolang#5486.
…ork time

Adds a repeatable --patch-realm PKGPATH=SRCDIR flag to
\`hardfork genesis\` that rewrites the genesis-mode addpkg tx for
PKGPATH in-place, replacing its Package.Files with the *.gno +
gnomod.toml files from SRCDIR. The source genesis on disk stays
untouched — the patch lives only in the in-memory GnoGenesisState
used to assemble the output.

Motivation: you cannot re-addpkg to the same path post-deploy
(unauthorized), and you cannot add a new .gno file to an existing
realm via a call, so the only way to land a code change on an
existing realm is to rewrite the original addpkg tx that deployed it.

Example (tested end-to-end in the hf-glue testbed gnolang#5486):

  hardfork genesis --source /path/to/source \\
    --patch-realm gno.land/r/sys/params=/src/examples/gno.land/r/sys/params \\
    --chain-id gnoland-1 --output genesis.json

Combined with gnolang#5368 (which adds halt.gno to r/sys/params), the forked
chain boots with the new GovDAO halt mechanism available:

  $ curl ... vm/qfile gno.land/r/sys/params
  → fee_collector.gno, gnomod.toml, halt.gno, params.gno, unlock.gno

Multiple --patch-realm flags can be combined to land several realm
upgrades in one fork.
@Gno2D2
Copy link
Copy Markdown
Collaborator

Gno2D2 commented Apr 17, 2026

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

No automated checks match this pull request.

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@moul moul merged commit 7e99bea into gnolang:feat/genesis-replay-upgrade3 Apr 17, 2026
126 of 127 checks passed
@moul moul deleted the feat/hardfork-replay-improvements branch April 17, 2026 16:26
@github-project-automation github-project-automation Bot moved this from 📥 Inbox to ✅ Done in 😎 Manfred's Board Apr 17, 2026
@moul
Copy link
Copy Markdown
Member Author

moul commented Apr 17, 2026

sorry shoul dhave commited directly in the pr

moul added a commit that referenced this pull request Apr 19, 2026
- Document GasReplayMode field and "source" mode
- Document GasUsed/GasWanted metadata fields
- Document auth.SkipGasMeteringKey context flag
- Document replay report with categorization
- Document RequestInitChain.InitialHeight cross-check (GnoGenesisState.InitialHeight is no longer "informational only")
- Document hardfork tooling: --patch-realm, hardfork test
- Add BaseApp.validateHeight / Info InitialHeight>1 fixes (PR #5540)
- Add genesis-mode sig verify against PastChainIDs[0] (PR #5540)
- Mark gas-tolerance and replay-report open items as resolved
- Add docs-linter stability fix note
jaekwon pushed a commit that referenced this pull request May 7, 2026
## Overview

Chain hardfork mechanism for gno.land: export all state and historical
transactions from the source chain, replay them during `InitChain` on
the new chain, and start producing blocks at the halted height. Replaces
the original single-`OriginalChainID` design from
[#5411](#5411) with a more flexible
multi-chain model (`PastChainIDs` allowlist + per-tx `ChainID`).

**History:**
- Original work: [#5411](#5411)
- Jae's refinements:
[feat/genesis-replay-upgrade2](https://github.com/gnolang/gno/tree/feat/genesis-replay-upgrade2)
- This PR: builds on top of Jae's work, adds fixes from extensive review
+ end-to-end validation on the full gnoland1 chain via the [hf-glue
testbed](#5486)

## What's in

### tm2 (consensus + SDK)
- **`GenesisDoc.InitialHeight`** consensus starts block production at
this height after `InitChain`; `Handshaker` sets `state.LastBlockHeight
= InitialHeight - 1`.
- **`BlockchainReactor`, `state`, `store`, validation** all updated to
handle chains where `InitialHeight > 1` (empty block store,
non-contiguous block save, validator set / consensus params persisted at
InitialHeight, etc.)
- **`BaseApp.lastBlockHeight` tracker** (this iteration): real chain
height = `multistoreVersion + initialHeightOffset`, with the offset
persisted under `mainInitialHeightKey` and restored on every restart.
`validateHeight` now enforces strict contiguity against real chain
height; the previous "allow monotonic jump" branch (which permanently
bypassed contiguity for `InitialHeight > 1` chains) is gone.
- **`BaseApp.Info` guard** handle calls before the multistore is loaded.
- **`auth.SkipGasMeteringKey`** context flag that lets `SetGasMeter`
bypass the new VM's gas meter (used for `GasReplayMode="source"`).
- **`RequestInitChain.InitialHeight`** new ABCI field so the app can
cross-check against `GnoGenesisState.InitialHeight`. Amino round-trip
test added.

### gno.land
- **`GnoGenesisState`** extensions:
- `PastChainIDs []string` allowlist of past chain IDs valid for
signature verification
- `InitialHeight int64` cross-checked against `GenesisDoc.InitialHeight`
- `GasReplayMode string` `""`/`"strict"` (default, new VM's gas meter)
or `"source"` (bypass gas meter, preserve source-chain outcomes)
- **`GnoTxMetadata`** extensions:
  - `BlockHeight int64` original block height
  - `ChainID string` originating chain ID
- `Failed bool` tx had non-zero return code on source chain (skipped
during replay)
- `SignerInfo []SignerAccountInfo` per-signer account metadata (address,
account number, pre-tx sequence) so signatures verify correctly even if
earlier txs diverged
- `GasUsed`, `GasWanted int64` source-chain gas (populated by
tx-archive, used by replay report)
- **`auth.NewAccountWithUncheckedNumber`** (this iteration, renamed from
`NewAccountWithNumber`): create accounts with a specific number,
bypassing the auto-increment counter. Doc comment now spells out the
precondition that the caller must enforce uniqueness; the rename forces
every call site to acknowledge it.
- **`validateSignerInfo` preflight** (this iteration): scans every
`SignerInfo` entry across all txs at the start of `loadAppState`.
Rejects the genesis if two different addresses claim the same account
number, or if a `SignerInfo` claims a number reserved by a balance-init
account at a different address. Defense-in-depth against a malformed
genesis silently corrupting state.
- **`InitChainerConfig.StrictReplay`** (this iteration): opt-in
fail-closed boot. Defaults to `false` for backwards compat. Hardfork
operators set it to `true` so any non-skipped tx replay failure aborts
`InitChain` instead of letting the chain boot in a corrupted state.
Skipped txs (`metadata.Failed = true`) do not count.
- **Genesis-mode tx sig verify with PastChainIDs[0]** genesis-mode txs
(no metadata or `BlockHeight == 0`) use the first `PastChainIDs` entry
for sig verify when a hardfork is in progress (PR #5540). The
genesis-mode chain-ID branch is now gated on `metadata == nil` (this
iteration) so migration txs (`metadata != nil`, `BlockHeight == 0`,
`Timestamp != 0`) keep their metadata-driven `ctxFn` instead of being
silently overwritten.
- **`BaseApp.InitChain` error surfacing** (this iteration): when
`InitChainer` returns `ResponseInitChain.Error`, return cleanly instead
of falling through to the validators-count sanity check, which would
otherwise panic with a misleading `"validators count mismatch"` and mask
the real cause.
- **Replay report** per-tx categorization emitted via logger after
`InitChain`: `ok` / `ok_gas_differs` / `failed` / `skipped_failed`.
Exposes `Outcomes()` and `FailedCount()` for external tooling.

### Hardfork tooling (`contribs/gnogenesis/internal/fork/`)
- **`gnogenesis fork generate`** generate a hardfork genesis from a
source chain (RPC URL, local data dir, or exported tarball).
- **`gnogenesis fork test`** local genesis replay smoke-test.
- **`--patch-realm PKGPATH=SRCDIR`** (repeatable) rewrite a genesis-mode
`addpkg` tx in-place with files from `SRCDIR`. Lets you deliver realm
upgrades as part of the fork (e.g. adding a new `.gno` file to an
existing realm) since you cannot re-addpkg post-deploy (PR #5540).
- **`--migration-tx`** inject a single migration tx at the end of the
historical replay.
- **`bruteForceSignerSequence`** resolve signer sequences during export
by trying candidate values against the signature.

## Bugs found and fixed during review

### tm2 consensus (all fixed)
1. **Fast-sync broken with InitialHeight > 1** `BlockPool` started at
`store.Height()+1 = 1` instead of `state.LastBlockHeight+1 =
InitialHeight`. Nodes trying to fast-sync would request non-existent
blocks.
2. **Validator set / consensus params not saved at InitialHeight**
`saveState` only saved validators when `nextHeight == 1`. With
InitialHeight > 1, `LoadValidators` failed and `LoadConsensusParams`
panicked at block InitialHeight+1.
3. **`ValidateBasic` bypass via zeroed `LastBlockID`** any block with
`LastBlockID.IsZero()` could skip commit validation. Fixed: only allow
skip when commit is also nil/empty.
4. **`BaseApp.validateHeight` permanent contiguity bypass** the previous
"allow monotonic jump" branch compared real block height against the
multistore version. After the first commit, `actual > prevHeight` is
trivially true on every subsequent block, so the contiguity check was
bypassed forever (an attacker or buggy consensus engine that skipped N
blocks would be silently accepted). Fixed by tracking real chain height
in `lastBlockHeight` (this iteration).
5. **`BaseApp.InitChain` masking real error** when `loadAppState`
returned an error response, the validators-count sanity check fired with
`"validators count mismatch"` masking the actual cause. Fixed: return
cleanly on error response (this iteration).

### gno.land (all fixed)
6. **`loadAppState` returns nil even on N tx failures** chain booted in
a corrupted state when historical-tx replay had failures. Fixed via
opt-in `StrictReplay` in `InitChainerConfig` (this iteration).
7. **Migration-tx `ctxFn` overwrite** the genesis-mode chain-ID branch
fired on any `metadata.BlockHeight == 0`, stomping the metadata-driven
`Timestamp` override on migration txs. Fixed: tighten predicate to
`metadata == nil` and compose with any prior `ctxFn` (this iteration).
8. **`NewAccountWithNumber` had no SignerInfo collision check** two
`SignerInfo` entries with the same `AccountNum` but different addresses,
or a `SignerInfo` colliding with a balance-init account, would silently
zero the original account's balance. Fixed: rename to
`NewAccountWithUncheckedNumber` (forcing every call site to acknowledge
the precondition) plus `validateSignerInfo` preflight in `loadAppState`
(this iteration).
9. **Failed-tx `ResponseDeliverTx` was empty (looked like success)**
explicit error marker so indexers can distinguish.
10. **`GnoGenesisState.InitialHeight` wasn't cross-checked against
`GenesisDoc.InitialHeight`** added `InitialHeight` to `RequestInitChain`
and validate in `loadAppState`.
11. **`RequestInitChain.InitialHeight` had no amino round-trip test**
silent registration regression would only surface during a real hardfork
(this iteration).

### Hardfork tooling (fixed)
12. **`applyOverlay` silent no-op** listed scripts but didn't execute
them, returned success. Fixed: returns error when scripts found but
execution not implemented.
13. **JSONL serialization used `encoding/json` instead of amino**
interface types (`std.Msg`) lost on round-trip. Fixed: both writer and
reader now use amino.
14. **`verifyGenesisFile` failure returned success** tool could produce
invalid genesis and exit 0. Fixed: failure aborts (opt out with
`--no-verify`).
15. **Zero unit tests for `bruteForceSignerSequence`** fixed: 10
table-driven tests.

### Docs linter (side fix for green CI)
- Skip `staging.gno.land`, `archive.org`, and add retry/timeout logic so
transient remote-link failures don't block unrelated PRs.

## Still open (design / follow-up)

- **RPC retry/resume**
(`contribs/gnogenesis/internal/fork/source_rpc.go`) a single transient
error during tx fetch aborts everything; needs exponential backoff +
checkpointing. Architectural, follow-up PR.
- **Streaming tx export** full tx history is held in memory; will OOM on
large chains. Needs streaming writer, follow-up PR.
- **`queryAccountAtHeight` silent nil** all error paths return nil with
no indication; flaky RPC → wrong sequence metadata.


## Cherry-picked from [#5597](#5597)
(this iteration)

Three follow-ups originally staged in the master-based hardfork series,
brought back to where they belong since they modify or extend code
introduced here:

- [`1babfe42a`](1babfe42a)
`fix(consensus): skip phantom heights during replay when InitialHeight >
1` — ABCI handshake replay path used to assume heights `[1,
appBlockHeight+1]` always have a stored block; for chains starting at
`InitialHeight > 1`, heights below `InitialHeight` never had blocks and
replay errored with "block not found for height 1".
- [`5bf2fa53e`](5bf2fa53e)
`fix(gnogenesis): default gas-storage params and gas_replay_mode in
hardfork genesis` — `buildHardforkGenesis` now defaults the post-#5415
`vm.params` gas-storage fields from `vm.DefaultParams()` when the source
has them all at zero, and sets `gas_replay_mode = "source"` when unset.
Operator overrides preserved. 4 unit tests.
- [`e31268467`](e31268467)
`feat(gnogenesis): add --skip-failing-genesis-txs and
--skip-genesis-sig-verification flags to fork test` — `make smoketest`
now matches what production validators actually run.

## End-to-end validation

The hf-glue testbed ([#5486](#5486))
runs `make fetch && make init && make up` against `rpc.gno.land`
halt@704052 and produces a 192 MB hardfork genesis that replays with **0
/ 2715 tx failures** and boots a live `gnoland-1` node.

## Dependencies / related PRs

- **Depends on / pairs with:**
[#5533](#5533) (`contribs/tx-archive`
metadata + `SignerInfo` populator) for replay-ready backups
- **Used in:** [#5486](#5486)
(hf-glue testbed)
- **Also fixed here:** [#5539](#5539)
(docs-linter skip staging preemptive fix, committed here too to keep CI
green)

## AI disclosure

Developed with significant assistance from Claude Code for testing,
review, and iterative fixes.

---------

Co-authored-by: moul <noreply@moul.io>
Co-authored-by: jaekwon <jae@tendermint.com>
assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: aeddi <antoine.e.b@gmail.com>

merging for moul
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related

Projects

Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants