test(hf-glue): hardfork end-to-end testbed (DO NOT MERGE) — integrates #5511 + #5376#5486
Draft
moul wants to merge 128 commits into
Draft
test(hf-glue): hardfork end-to-end testbed (DO NOT MERGE) — integrates #5511 + #5376#5486moul wants to merge 128 commits into
moul wants to merge 128 commits into
Conversation
Signed-off-by: moul <94029+moul@users.noreply.github.com> Co-authored-by: moul <94029+moul@users.noreply.github.com> Co-authored-by: aeddi <antoine.e.b@gmail.com> Co-authored-by: Antoine Eddi <5222525+aeddi@users.noreply.github.com> Co-authored-by: Morgan <git@howl.moe> Co-authored-by: Morgan Bazalgette <morgan@morganbaz.com>
Enables GovDAO to propose a coordinated chain halt at a specific block height without requiring every operator to pass a CLI flag. This is the governance-driven counterpart to the --halt-height CLI flag. Changes: - Add `NewSetHaltHeightRequest(height int64)` to `r/sys/params` realm, allowing GovDAO to vote on halting the chain at a target block. - Add `nodeParamsKeeper` to validate `node:p:halt_height` params. - Register the "node" module in the params keeper so halt_height can be set via governance proposals. - Extend `EndBlocker` to read `node:p:halt_height` from the params store and call `osm.Kill()` when the halt height is reached. Usage: // Create and submit a GovDAO proposal to halt at block 100000 pr := params.NewSetHaltHeightRequest(100_000) id := dao.MustCreateProposal(cross, pr) // After approval and execution, all nodes will halt at block 100000 Generated with [Claude Code](https://claude.com/claude-code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Extends the GovDAO halt proposal with a mandatory minimum binary version
field. When set, nodes refuse to restart unless their version satisfies
the requirement, preventing an old binary from accidentally resuming a
chain that was halted for an upgrade.
- `NewSetHaltRequest(height, minVersion)` sets both `node:p:halt_height`
and `node:p:halt_min_version` atomically in one GovDAO proposal.
- `checkNodeStartupParams` runs at node startup (after state is loaded)
and compares `version.Version` against the stored `halt_min_version`.
- `meetsMinVersion` / `parseGnolandVersion` handle the "chain/gnolandX.Y"
version format used for gno.land chain releases, with a string-equality
fallback for other formats.
Example: setting minVersion="chain/gnoland1.1" will allow 1.1 and newer
to start, but reject 1.0 ("develop" also rejected unless it matches).
Generated with [Claude Code](https://claude.com/claude-code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
…g#5400) TypeCheckMemPackage only writes a package to permCache when it is reached as a dependency via ImportFrom (canPerm=true). The root package of each call is never self-stored. This left 22 "leaf" stdlibs (packages not imported by any other stdlib, e.g. time, regexp, math/rand) absent from vm.typeCheckCache on every node startup. On a production cold-start node (LoadStdlib, CacheStdlibLoad=false), the cache was entirely empty — every stdlib import in a user tx required a GetMemPackage store read (8 gas/byte). On a restarted node (Initialize), the 22 leaf stdlibs were still missing. This caused non-deterministic gas consumption: nodes that had restarted disagreed with genesis-fresh nodes on tx gas, triggering a consensus halt on gnoland1 at block 352922. Fix: capture the *types.Package return value from each TypeCheckMemPackage call in the init loop and store it directly into opts.Cache. Applied to all three initialization paths: Initialize, LoadStdlibCached, and LoadStdlib. The LoadStdlib change additionally routes the cache to vm.typeCheckCache directly (instead of the per-tx context clone) so the results survive beyond the initialization transaction. Verified by: - TestTypeCheckCacheContainsAllStdlibs: asserts all InitOrder() stdlibs are present in vm.typeCheckCache after both cold and warm initialization. - TestAddPkgGasWithTypeCheckCache: asserts identical gas for a strconv-importing addpkg regardless of typeCheckCache state (was 7M cold vs 2.1M warm before). - addpkg_stdlib_typecheckcache.txtar: deploys a time-importing package with gas_wanted=2700000; succeeds at ~2.3M with fix, OOGs at ~3.2M without. This is a hotfix, hence it is on chain/gnoland1 as the base. I fear this may cause different gas results in the chain, so we still need to figure out: 1. A migration strategy for the existing nodes (to re-run block 352922) 2. And also understanding the impact that this has on validators joining in the network afterwards. I feel like this PR changes the gas values of all of the transactions, including genesis transactions, so we got to understand if nodes would still validate transactions correctly with lower gas values or if they are no longer valid, and this would require a chain re-start re-running the transactions.
…nolang#5409) - Add `gnoland version` subcommand mirroring `gno version` and `gnokey version` - Add `BuildVersion`/`build_version` field to `ResultStatus` (RPC /status endpoint), populated from `tm2/pkg/version.Version` - Inject version via ldflags in Dockerfile, computed from git at build time; all build stages now read from a shared build_version file written in setup-gnocore goreleaser already has injection of the version, so no changes needed there. --------- Co-authored-by: moul <94029+moul@users.noreply.github.com>
…ng#5410) ## Summary Adds `contribs/gnobr` — a block rollback tool for gnoland validators. It trims the blockstore to a target height, patches the app hash in state.db, and wipes app state so gnoland replays all blocks locally on restart. No network access or special binary patches needed. ### Usage ```bash # Build from the gno repo cd contribs/gnobr && go build -o gnobr . # Stop your node, then run: gnobr --data-dir gnoland-data --drop-after 352921 \ --app-hash 14BD8BB9FAD9869B86F1BFFD1A16DD3A02C3534323F6E15121025BE5DFDC9C51 # Restart your node — it replays blocks 1..352921 locally from its own blockstore. ``` ### What it does 1. **Trims blockstore.db** — removes all blocks after the target height 2. **Patches state.db** — updates the AppHash to the correct value (via `--app-hash`) so the Handshaker doesn't panic on mismatch 3. **Wipes gnolang.db** — forces the app to replay from genesis 4. **Wipes WAL** — removes stale write-ahead log 5. **Resets priv_validator_state.json** — prevents double-signing On restart, gnoland's Handshaker sees `appHeight=0, storeHeight=N, stateHeight=N`, runs InitChain, then replays all N blocks from the local blockstore. Zero network access needed. ### Flags | Flag | Description | |---|---| | `--data-dir` | Path to gnoland data directory (default: `gnoland-data`) | | `--drop-after` | Keep blocks up to this height, drop everything after | | `--app-hash` | Hex-encoded app hash to write into state.db | | `--dry-run` | Show what would be done without modifying anything | ### Why During the gnoland1 chain halt at height 352922, validators committed a block with a divergent app hash. The `chain/gnoland1.1` tag fixes the root cause, but validators who committed the bad block can't just update the binary — state.db contains the wrong app hash, causing a panic on replay. This tool patches it cleanly. ### Tested Successfully tested on gnoland1 (val1.moul.p2p.team): - Restored from backup, ran gnobr, restarted with clean `chain/gnoland1.1` binary (no patches) - Node replayed all 352921 blocks locally, reached correct app hash `14BD8BB9...` <details> <summary>Contributors' checklist</summary> - [x] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added `benchmarks` label to the PR or not needed </details>
Aligns with gnolang#5334's approach: GovDAO EndBlocker now sets the halt height on BaseApp, which panics in BeginBlock of the next block. This is deterministic (no async signals) and ensures the halted block is fully committed.
…es (gnolang#5334) ## Summary Adds a halt height mechanism for coordinated chain upgrades. The node stops after committing the specified block height. ### How to set it ```bash gnoland config set halt_height 352922 ``` Or edit `config.toml` directly: ```toml halt_height = 352922 ``` ### How it works 1. After `finalizeCommit`, consensus checks if `height >= halt_height` 2. If so, calls `osm.Kill()` for a graceful shutdown 3. The check is at the consensus level (not ABCI), following the same pattern as `WithEarlyStart` ### Scope and future direction This is a **temporary coordination tool** for the current chain upgrade. For the gnoland1 → gnoland-1 hard fork, validators set `halt_height` in their config, all nodes stop at the same block, then validators swap binary + config and restart. After the upgrade, the proper mechanism will be **GovDAO-based halting** (gnolang#5368), which adds: - On-chain `halt_height` param set via governance proposal (no manual config needed) - `halt_min_version` — prevents old binaries from restarting after halt - Version guard at startup so validators can't accidentally run the wrong binary Once gnolang#5368 is merged and active, `halt_height` in config becomes a **node operator tool** (e.g., "stop my node at height X for maintenance") rather than a coordination mechanism. Coordination should happen through governance. ### No CLI flag — config only Per @tbruyelle's suggestion, there's no `--halt-height` CLI flag. Config file is the single source of truth. This avoids the risk of validators missing the flag in duplicated command setups across their infrastructure. ### Related - gnolang#5368 — GovDAO-based halt height + version guard (Phase 2, replaces this for coordination) - gnolang#5376 — gnoland-1 chain config - gnolang#5411 — chain upgrade genesis replay <details> <summary>Contributors' checklist</summary> - [x] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added `benchmarks` label to the PR or not needed </details>
…ght config Addresses tbruyelle's review feedback: 1. Panic if new binary runs before the chain has halted at halt_height 2. Add skip_upgrade_height config field to bypass the check when the validator has already migrated state
Prepares the repository for the gnoland1 → gnoland-1 hard fork:
- Add misc/deployments/gnoland-1/ with:
- migrate-from-gnoland1.sh: placeholder with a detailed TODO covering
halt verification, state export, migration transforms (r/sys/params,
r/gnops/valopers, namereg, gas params), genesis assembly, verification,
and restart coordination. Exits with an error until implemented.
- config.toml: copy of gnoland1 config with meter_name=gnoland-1 and
peer/seed addresses reset (to be filled post-fork).
- govdao-scripts/: copies of gnoland1 scripts with CHAIN_ID=gnoland-1.
- README.md: upgrade workflow, what changed, and ⚠️ migration TODO warning.
- Update docs:
- docs/resources/gnoland-networks.md: Betanet chain ID gnoland1 → gnoland-1
- docs/resources/gas-fees.md: update --chainid example
- docs/users/explore-with-gnoweb.md: update Betanet chain ID reference
The migration script is the critical missing piece — the hard fork cannot
happen until it is written and dry-run on test12.
Generated with [Claude Code](https://claude.com/claude-code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Revert premature doc references to gnoland-1 chain ID in gas-fees.md and explore-with-gnoweb.md (hardfork hasn't happened yet) - Remove premature "Note" callout from gnoland-networks.md - Update migrate-from-gnoland1.sh: reflect Scenario A decision (genesis tx-replay with InitialHeight), document blockers (gnolang#5411, gnolang#5390, Jae's InitialHeight tm2 work), reference issue gnolang#5374 for tracking - Update gnoland-1/README.md: reflect correct PR merge status, document Scenario A approach, list migration blockers explicitly
- Add ChainID field to GnoTxMetadata for tx provenance recording - Add InitialHeight validation (non-negative) to GenesisDoc.Validate and ValidateAndComplete - Add test cases: no chain ID override when BlockHeight=0, no override when OriginalChainID unset - Update ADR: document per-tx vs state-level design choice, mark InitialHeight as implemented end-to-end
…d-1 README PR gnolang#5373 (valoper fee script) was closed without merging. The valoper registration fee was already set to 0 via a GovDAO transaction on gnoland1, so no code change is needed — the state is preserved in genesis replay.
- Fix comment headers: 'gnoland1' → 'gnoland-1' in add-validator.sh and rm-validator.sh - Fix stale REMOTE default comment: 127.0.0.1:26657 → betanet endpoint
The upstream MANIFESTO.md references a PDF in a now-404'd GitHub repo (github.com/jaekwon/ephesus). Replace with a Wayback Machine wildcard redirect so the docs linter (which treats remote 4xx as hard-fail) stops blocking CI on every PR.
The upstream MANIFESTO.md references a PDF in a now-404'd GitHub repo (github.com/jaekwon/ephesus). Replace with a Wayback Machine wildcard redirect so the docs linter (which treats remote 4xx as hard-fail) stops blocking CI on every PR.
web.archive.org redirects + ia*.us.archive.org file hosts both throttle and 5xx intermittently. Several PRs hit flaky docs/MANIFESTO.md checks on these exact URLs. Since we use archive.org precisely to point at links that are already dead upstream, the liveness check adds no value — skip them.
web.archive.org redirects + ia*.us.archive.org file hosts both throttle and 5xx intermittently. Several PRs hit flaky docs/MANIFESTO.md checks on these exact URLs. Since we use archive.org precisely to point at links that are already dead upstream, the liveness check adds no value — skip them.
- Document GasReplayMode field and "source" mode - Document GasUsed/GasWanted metadata fields - Document auth.SkipGasMeteringKey context flag - Document replay report with categorization - Document RequestInitChain.InitialHeight cross-check (GnoGenesisState.InitialHeight is no longer "informational only") - Document hardfork tooling: --patch-realm, hardfork test - Add BaseApp.validateHeight / Info InitialHeight>1 fixes (PR gnolang#5540) - Add genesis-mode sig verify against PastChainIDs[0] (PR gnolang#5540) - Mark gas-tolerance and replay-report open items as resolved - Add docs-linter stability fix note
The hardfork tooling lives alongside other genesis-manipulation tools under contribs/gnogenesis instead of a standalone misc/hardfork module. Command mapping: misc/hardfork genesis [flags] → gnogenesis fork generate [flags] misc/hardfork test [flags] → gnogenesis fork test [flags] Changes: - Move misc/hardfork/*.go → contribs/gnogenesis/internal/fork/ - Rename genesis.go / genesisCfg / execGenesis → generate.go / generateCfg / execGenerate (subcommand is now 'generate' under 'fork') - Register fork.NewForkCmd in contribs/gnogenesis/genesis.go - Delete misc/hardfork/ (including its separate go.mod — absorbed into contribs/gnogenesis module) - Move PLAN_account_metadata.md → gno.land/adr/pr5511_... - Update misc/deployments/gnoland-1/generate-genesis.sh to use 'gnogenesis fork generate' instead of the 'hardfork' binary - Update ADR pr5489 tooling section No behavioural changes — same flags, same logic, same tests.
…to moul/hf-glue-experimental # Conflicts: # tm2/pkg/sdk/baseapp.go
- docs/MANIFESTO.md: restore original jaekwon/ephesus URL (upstream is back online; wayback redirect no longer needed) - misc/docs/tools/linter/urls.go: drop archive.org skip — it was a workaround for the MANIFESTO URL flake that is now resolved Leaving gnoland-networks.md etc. untouched (those came in via the gnolang#5376 / gnolang#5511 merges and are intentional).
This reverts commit 5c900b2.
This reverts commit 657f2dd.
- Remove `--overlay-dir` flag and applyOverlay from 'gnogenesis fork generate' (the feature was dead code — it errored on any non-empty overlay dir and no one used it) - Remove misc/deployments/gnoland-1/overlay/ (empty feature) - Simplify misc/deployments/gnoland-1/generate-genesis.sh to SOURCE and HALT_HEIGHT env vars only (drop --output/--skip-txs/--debug/positional args/EXTRA_ARGS — those are edge-case knobs; users who need them call 'gnogenesis fork generate' directly) - Drop TestApplyOverlay_* tests along with the code
Three ADRs (pr5411 superseded, pr5489 mixed concerns, pr5511 PLAN) are consolidated into two, split by layer: - tm2/adr/pr5511_initial_height.md Focused on tm2 changes: GenesisDoc.InitialHeight, RequestInitChain.InitialHeight, BlockchainReactor/state/store/BaseApp fixes for InitialHeight > 1, auth.SkipGasMeteringKey. - gno.land/adr/pr5511_chain_upgrade_genesis_replay.md Focused on gno.land app: GnoTxMetadata and GnoGenesisState fields (PastChainIDs, GasReplayMode, SignerInfo, gas fields), sequence recovery algorithm, genesis replay flow, replay report, gnogenesis fork tooling, bugs found, validation. Cross-linked between the two. Content reflects the final PR state (overlay mechanism removed, tool absorbed into gnogenesis fork, etc.).
…to moul/hf-glue-experimental
Appends one or more genesis-mode txs at the END of appState.Txs — i.e. AFTER the replayed historical-tx stream. Repeatable. FILE is a .jsonl of gnoland.TxWithMetadata (amino JSON per line). BlockHeight is forced to 0 so each tx is treated as genesis-mode: chain-id via PastChainIDs[0], sig verify skipped by --skip-genesis-sig-verification. Blank and # lines are ignored. Use case: chain-specific post-replay migrations that need replayed state to exist first — e.g. a govDAO proposal that updates r/sys/validators/v2 so the hardforked chain's in-gno valset matches the new GenesisDoc.Validators (not just the tm2 config side). The plumbing lives here; chain configs in misc/deployments/*/ wire it up with their own migration files.
…to moul/hf-glue-experimental
Adds a post-history migration tx that reconciles r/sys/validators/v2
with the new GenesisDoc.Validators after the gnoland1 → gnoland-1
hardfork. Without this, tm2 consensus would use the new single-
validator set while every in-gno query (valopers, govDAO proposals
touching the valset) would still see the pre-fork gnoland1 set
written by govdao_prop1.
Layout:
misc/deployments/gnoland-1/migrations/
01_reset_valset.gno.tmpl Gno MsgRun body with placeholders for
OLD_VALIDATORS_GO + NEW_VALIDATORS_GO.
Submits + votes + executes a govDAO
proposal via r/sys/validators/v2.NewPropRequest.
build.sh Renders the template with the local
priv_validator_key.json (new validator)
and the hard-coded initial gnoland1
valset (to be removed). Wraps into a
signed MsgRun tx and emits migrations.jsonl.
generate-genesis.sh gains a PV_KEY env var; when set it runs build.sh
and passes --migration-tx to `gnogenesis fork generate`.
Sig check passes because migration txs are genesis-mode (BlockHeight=0)
and --skip-genesis-sig-verification is on at replay; the MsgRun executes
as $CALLER (default: manfred, a govDAO T1 member) regardless of which
key signed.
- hf.sh: point at 'gnogenesis fork generate' (was the removed misc/hardfork)
and plumb hf_migration_tx through new --migration-tx flag
- migrate.sh: call misc/deployments/gnoland-1/migrations/build.sh when
out/gnoland-home/secrets/priv_validator_key.json exists ("make init"
output), emit migrations.jsonl, register via hf_migration_tx
- build.sh: fix BSD sed newline bug (use awk), fix gnokey recover
stdin handshake (\n + mnemonic), derive bech32 gpub1 pubkey via
'gnoland secrets get' (r/sys/validators/v2 wants bech32 not base64)
- 01_reset_valset.gno.tmpl: rename placeholder tokens in doc comment
so they don't collide with the substitution
aeddi
added a commit
to aeddi/gno
that referenced
this pull request
Apr 21, 2026
PR gnolang#5486 commit bd3580d ("refactor: absorb misc/hardfork into 'gnogenesis fork' subcommand") moved misc/hardfork/ into contribs/gnogenesis/internal/fork/ and rewired the CLI: misc/hardfork test → gnogenesis fork test That refactor updated misc/deployments/gnoland-1/generate-genesis.sh but missed the references in misc/hf-glue/. As a result, scripts/replay-log.sh (invoked by `make replay-log` and `make reports`) still does: cd "$REPO/misc/hardfork" go run . test ... which fails with "No such file or directory" because misc/hardfork no longer exists on the PR head. Update the cd target to contribs/gnogenesis and the subcommand to `fork test` to match the new layout. Other stale refs in misc/hf-glue/ (Makefile smoketest target, fetch-from-dir.sh, README, lib/hf.sh comment) are also broken but addressed in separate commits / left as docs.
Adds a 2-node docker cluster variant of the hf-glue testbed to verify that a hardfork genesis actually drives consensus across connected validators (not just replays in a single node). - fixvalidator: --priv-key is now repeatable (names auto-suffixed -N) - scripts/init-cluster.sh: generate N=$NODES homes under out/cluster/, rewrite genesis with all validators, wire per-node config.toml with persistent_peers pointing at other nodes over the compose network - docker-compose.cluster.yml: 2 gnoland services (node0/node1) + gnoweb pointing at node0; ports 26656/7 + 36656/7 on host - migrate.sh + build.sh: PV_KEYS (colon-separated) for cluster-mode valset-swap migration (all cluster validators land in r/sys/validators/v2) - Makefile: cluster-init / cluster-up / cluster-down / cluster-logs / cluster-status / cluster-reset / cluster-reset-db Smoke-tested: init-cluster generates correct peer entries; build.sh emits migration with both validators; docker compose up starts both containers; replay verification pending.
README gets a cluster-mode section + new Make targets in the table. Smoke-test finding: cluster wiring is correct (docker DNS, persistent_peers, validator-set rewrite, per-node secrets), nodes peer up and cast consensus votes at the initial height. But ABCI handshake fails on restart with 'block not found for height 1' when genesis has initial_height > 1, because the consensus WAL replay path expects block 1 to exist. Documented as a separate upstream issue from the cluster harness itself.
… > 1
The ABCI handshake replay path assumed heights in [1, appBlockHeight+1]
always have a block in the store. For chains that start at InitialHeight > 1
(e.g. a hardfork upgrade that replays historical txs at genesis), heights
below InitialHeight never had a block — LoadBlock returns nil and the
replay errors with "block not found for height 1".
Repro (before this fix):
- Fresh chain with genesis.initial_height = 2.
- Node runs InitChainer, saves state.LastBlockHeight = 1.
- Consensus produces block 2 and stores it; node crashes before app Commit.
- On restart: appBlockHeight=0, storeBlockHeight=2, stateBlockHeight=1.
- ReplayBlocks routes to replayBlocks(mutateState=true), which loops
from appBlockHeight+1 = 1 up to storeBlockHeight-1 = 1. LoadBlock(1)
returns nil → handshake error, node crash-loops.
Fix: clamp the loop start to max(appBlockHeight+1, InitialHeight). Heights
below InitialHeight are phantom — skip them, let the mutateState
replayBlock apply the real first block (at InitialHeight or above) to the
post-InitChain app.
Regression test TestReplayBlocks_SkipsPhantomHeightsAtInitialHeight: uses
a recording BlockStore that records LoadBlock requests and asserts none
are below InitialHeight. Verified: test fails without the replay.go fix
("LoadBlock(1) was called but 1 < InitialHeight 100") and passes with it.
…nconsistency When baseStore's package-index says 'index N -> path P' but iavlStore has no MemPackage under path-key(P), GetMemPackage returned nil, IterMemPackage yielded that nil, and downstream callers (ParseMemPackage et al) SIGSEGV'd on 'mpkg.Type.(MemPackageType)' with no clue where the nil came from. Replace the silent nil with a descriptive panic that names the inconsistent index entry and path. The underlying atomicity issue (how baseStore and iavlStore can get out of sync across a crash) is a separate investigation; this patch just surfaces the symptom clearly so the next person reading the stack trace has a starting point.
…ication Single-node restart with fresh state + initial_height=2 genesis now works end-to-end (verified: height=20, restarts=0, no panic) thanks to ca97894. The earlier 'hardfork handshake replay' description of the cluster issue was wrong: that bug IS fixed. The remaining cluster crash-loop is a separate cross-store atomicity bug that only triggers on the 2-validator consensus path after block 2 (7a88a8a turns the resulting SIGSEGV into a descriptive panic).
…re atomicity bug Investigated the cluster crash-loop. Actual root cause is NOT the InitialHeight>1 replay (that's fixed) and NOT a cluster-specific state machine bug. It's: 1. Docker Desktop default VM = 7.65 GiB. Two gnoland nodes during hardfork replay = ~3.5 GiB each. They fit, barely, until consensus and p2p push memory over the edge and the kernel OOM-kills one (exitcode=137, no panic, no logs). 2. The OOM SIGKILL lands between two writes in AddMemPackage: - baseStore (dbadapter): Set() hits PebbleDB immediately, no buffer - iavlStore (iavl): Set() buffers in IAVL tree until CommitMultiStore So on-disk has the package index entry but no package data. On restart, VMKeeper.Initialize's IterMemPackage yields a nil mpkg (indexed path but no iavl entry) and SIGSEGVs. 3. 7a88a8a already replaced the SIGSEGV with a descriptive panic identifying the inconsistent index/path — that's the diagnostic minimum. Fixing the atomicity properly would move the package index under iavlStore (or wrap baseStore in a committable buffer), which is a tm2 store-layer change outside this PR's scope. Single-node works because it rarely crashes mid-commit. Cluster surfaces the bug because OOM is deterministic. - docker-compose.cluster.yml: prominent memory-requirement banner - README.md: corrected 'known issue' section with actual root cause, store-layer explanation, and the workaround (bump Docker VM to >=12G)
This was referenced Apr 24, 2026
moul
added a commit
that referenced
this pull request
Apr 27, 2026
These were added with the original 5411 work but aren't needed in this PR. The hardfork genesis flow now lives in the hf-glue testbed (#5486).
jaekwon
pushed a commit
that referenced
this pull request
May 7, 2026
## Overview Chain hardfork mechanism for gno.land: export all state and historical transactions from the source chain, replay them during `InitChain` on the new chain, and start producing blocks at the halted height. Replaces the original single-`OriginalChainID` design from [#5411](#5411) with a more flexible multi-chain model (`PastChainIDs` allowlist + per-tx `ChainID`). **History:** - Original work: [#5411](#5411) - Jae's refinements: [feat/genesis-replay-upgrade2](https://github.com/gnolang/gno/tree/feat/genesis-replay-upgrade2) - This PR: builds on top of Jae's work, adds fixes from extensive review + end-to-end validation on the full gnoland1 chain via the [hf-glue testbed](#5486) ## What's in ### tm2 (consensus + SDK) - **`GenesisDoc.InitialHeight`** consensus starts block production at this height after `InitChain`; `Handshaker` sets `state.LastBlockHeight = InitialHeight - 1`. - **`BlockchainReactor`, `state`, `store`, validation** all updated to handle chains where `InitialHeight > 1` (empty block store, non-contiguous block save, validator set / consensus params persisted at InitialHeight, etc.) - **`BaseApp.lastBlockHeight` tracker** (this iteration): real chain height = `multistoreVersion + initialHeightOffset`, with the offset persisted under `mainInitialHeightKey` and restored on every restart. `validateHeight` now enforces strict contiguity against real chain height; the previous "allow monotonic jump" branch (which permanently bypassed contiguity for `InitialHeight > 1` chains) is gone. - **`BaseApp.Info` guard** handle calls before the multistore is loaded. - **`auth.SkipGasMeteringKey`** context flag that lets `SetGasMeter` bypass the new VM's gas meter (used for `GasReplayMode="source"`). - **`RequestInitChain.InitialHeight`** new ABCI field so the app can cross-check against `GnoGenesisState.InitialHeight`. Amino round-trip test added. ### gno.land - **`GnoGenesisState`** extensions: - `PastChainIDs []string` allowlist of past chain IDs valid for signature verification - `InitialHeight int64` cross-checked against `GenesisDoc.InitialHeight` - `GasReplayMode string` `""`/`"strict"` (default, new VM's gas meter) or `"source"` (bypass gas meter, preserve source-chain outcomes) - **`GnoTxMetadata`** extensions: - `BlockHeight int64` original block height - `ChainID string` originating chain ID - `Failed bool` tx had non-zero return code on source chain (skipped during replay) - `SignerInfo []SignerAccountInfo` per-signer account metadata (address, account number, pre-tx sequence) so signatures verify correctly even if earlier txs diverged - `GasUsed`, `GasWanted int64` source-chain gas (populated by tx-archive, used by replay report) - **`auth.NewAccountWithUncheckedNumber`** (this iteration, renamed from `NewAccountWithNumber`): create accounts with a specific number, bypassing the auto-increment counter. Doc comment now spells out the precondition that the caller must enforce uniqueness; the rename forces every call site to acknowledge it. - **`validateSignerInfo` preflight** (this iteration): scans every `SignerInfo` entry across all txs at the start of `loadAppState`. Rejects the genesis if two different addresses claim the same account number, or if a `SignerInfo` claims a number reserved by a balance-init account at a different address. Defense-in-depth against a malformed genesis silently corrupting state. - **`InitChainerConfig.StrictReplay`** (this iteration): opt-in fail-closed boot. Defaults to `false` for backwards compat. Hardfork operators set it to `true` so any non-skipped tx replay failure aborts `InitChain` instead of letting the chain boot in a corrupted state. Skipped txs (`metadata.Failed = true`) do not count. - **Genesis-mode tx sig verify with PastChainIDs[0]** genesis-mode txs (no metadata or `BlockHeight == 0`) use the first `PastChainIDs` entry for sig verify when a hardfork is in progress (PR #5540). The genesis-mode chain-ID branch is now gated on `metadata == nil` (this iteration) so migration txs (`metadata != nil`, `BlockHeight == 0`, `Timestamp != 0`) keep their metadata-driven `ctxFn` instead of being silently overwritten. - **`BaseApp.InitChain` error surfacing** (this iteration): when `InitChainer` returns `ResponseInitChain.Error`, return cleanly instead of falling through to the validators-count sanity check, which would otherwise panic with a misleading `"validators count mismatch"` and mask the real cause. - **Replay report** per-tx categorization emitted via logger after `InitChain`: `ok` / `ok_gas_differs` / `failed` / `skipped_failed`. Exposes `Outcomes()` and `FailedCount()` for external tooling. ### Hardfork tooling (`contribs/gnogenesis/internal/fork/`) - **`gnogenesis fork generate`** generate a hardfork genesis from a source chain (RPC URL, local data dir, or exported tarball). - **`gnogenesis fork test`** local genesis replay smoke-test. - **`--patch-realm PKGPATH=SRCDIR`** (repeatable) rewrite a genesis-mode `addpkg` tx in-place with files from `SRCDIR`. Lets you deliver realm upgrades as part of the fork (e.g. adding a new `.gno` file to an existing realm) since you cannot re-addpkg post-deploy (PR #5540). - **`--migration-tx`** inject a single migration tx at the end of the historical replay. - **`bruteForceSignerSequence`** resolve signer sequences during export by trying candidate values against the signature. ## Bugs found and fixed during review ### tm2 consensus (all fixed) 1. **Fast-sync broken with InitialHeight > 1** `BlockPool` started at `store.Height()+1 = 1` instead of `state.LastBlockHeight+1 = InitialHeight`. Nodes trying to fast-sync would request non-existent blocks. 2. **Validator set / consensus params not saved at InitialHeight** `saveState` only saved validators when `nextHeight == 1`. With InitialHeight > 1, `LoadValidators` failed and `LoadConsensusParams` panicked at block InitialHeight+1. 3. **`ValidateBasic` bypass via zeroed `LastBlockID`** any block with `LastBlockID.IsZero()` could skip commit validation. Fixed: only allow skip when commit is also nil/empty. 4. **`BaseApp.validateHeight` permanent contiguity bypass** the previous "allow monotonic jump" branch compared real block height against the multistore version. After the first commit, `actual > prevHeight` is trivially true on every subsequent block, so the contiguity check was bypassed forever (an attacker or buggy consensus engine that skipped N blocks would be silently accepted). Fixed by tracking real chain height in `lastBlockHeight` (this iteration). 5. **`BaseApp.InitChain` masking real error** when `loadAppState` returned an error response, the validators-count sanity check fired with `"validators count mismatch"` masking the actual cause. Fixed: return cleanly on error response (this iteration). ### gno.land (all fixed) 6. **`loadAppState` returns nil even on N tx failures** chain booted in a corrupted state when historical-tx replay had failures. Fixed via opt-in `StrictReplay` in `InitChainerConfig` (this iteration). 7. **Migration-tx `ctxFn` overwrite** the genesis-mode chain-ID branch fired on any `metadata.BlockHeight == 0`, stomping the metadata-driven `Timestamp` override on migration txs. Fixed: tighten predicate to `metadata == nil` and compose with any prior `ctxFn` (this iteration). 8. **`NewAccountWithNumber` had no SignerInfo collision check** two `SignerInfo` entries with the same `AccountNum` but different addresses, or a `SignerInfo` colliding with a balance-init account, would silently zero the original account's balance. Fixed: rename to `NewAccountWithUncheckedNumber` (forcing every call site to acknowledge the precondition) plus `validateSignerInfo` preflight in `loadAppState` (this iteration). 9. **Failed-tx `ResponseDeliverTx` was empty (looked like success)** explicit error marker so indexers can distinguish. 10. **`GnoGenesisState.InitialHeight` wasn't cross-checked against `GenesisDoc.InitialHeight`** added `InitialHeight` to `RequestInitChain` and validate in `loadAppState`. 11. **`RequestInitChain.InitialHeight` had no amino round-trip test** silent registration regression would only surface during a real hardfork (this iteration). ### Hardfork tooling (fixed) 12. **`applyOverlay` silent no-op** listed scripts but didn't execute them, returned success. Fixed: returns error when scripts found but execution not implemented. 13. **JSONL serialization used `encoding/json` instead of amino** interface types (`std.Msg`) lost on round-trip. Fixed: both writer and reader now use amino. 14. **`verifyGenesisFile` failure returned success** tool could produce invalid genesis and exit 0. Fixed: failure aborts (opt out with `--no-verify`). 15. **Zero unit tests for `bruteForceSignerSequence`** fixed: 10 table-driven tests. ### Docs linter (side fix for green CI) - Skip `staging.gno.land`, `archive.org`, and add retry/timeout logic so transient remote-link failures don't block unrelated PRs. ## Still open (design / follow-up) - **RPC retry/resume** (`contribs/gnogenesis/internal/fork/source_rpc.go`) a single transient error during tx fetch aborts everything; needs exponential backoff + checkpointing. Architectural, follow-up PR. - **Streaming tx export** full tx history is held in memory; will OOM on large chains. Needs streaming writer, follow-up PR. - **`queryAccountAtHeight` silent nil** all error paths return nil with no indication; flaky RPC → wrong sequence metadata. ## Cherry-picked from [#5597](#5597) (this iteration) Three follow-ups originally staged in the master-based hardfork series, brought back to where they belong since they modify or extend code introduced here: - [`1babfe42a`](1babfe42a) `fix(consensus): skip phantom heights during replay when InitialHeight > 1` — ABCI handshake replay path used to assume heights `[1, appBlockHeight+1]` always have a stored block; for chains starting at `InitialHeight > 1`, heights below `InitialHeight` never had blocks and replay errored with "block not found for height 1". - [`5bf2fa53e`](5bf2fa53e) `fix(gnogenesis): default gas-storage params and gas_replay_mode in hardfork genesis` — `buildHardforkGenesis` now defaults the post-#5415 `vm.params` gas-storage fields from `vm.DefaultParams()` when the source has them all at zero, and sets `gas_replay_mode = "source"` when unset. Operator overrides preserved. 4 unit tests. - [`e31268467`](e31268467) `feat(gnogenesis): add --skip-failing-genesis-txs and --skip-genesis-sig-verification flags to fork test` — `make smoketest` now matches what production validators actually run. ## End-to-end validation The hf-glue testbed ([#5486](#5486)) runs `make fetch && make init && make up` against `rpc.gno.land` halt@704052 and produces a 192 MB hardfork genesis that replays with **0 / 2715 tx failures** and boots a live `gnoland-1` node. ## Dependencies / related PRs - **Depends on / pairs with:** [#5533](#5533) (`contribs/tx-archive` metadata + `SignerInfo` populator) for replay-ready backups - **Used in:** [#5486](#5486) (hf-glue testbed) - **Also fixed here:** [#5539](#5539) (docs-linter skip staging preemptive fix, committed here too to keep CI green) ## AI disclosure Developed with significant assistance from Claude Code for testing, review, and iterative fixes. --------- Co-authored-by: moul <noreply@moul.io> Co-authored-by: jaekwon <jae@tendermint.com> assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: aeddi <antoine.e.b@gmail.com> merging for moul
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR's purpose is integration + proof. The actual code changes are being extracted into smaller, single-concern PRs that land on
masteror on #5511 directly. As those land, this PR rebases onto them and shrinks automatically.How to review
Pick whichever level you prefer:
misc/hf-glue/). Useful when you want to check that all the small PRs compose into something that actually works against real gnoland1 state.When a small PR merges, we merge
master(or its base) back into this branch; the merged content drops out and the meta PR gets smaller. Easier than one 3000-LOC review.Extracted PRs
fix(tm2/sdk): InitialHeight > 1 support+feat(gnoland): genesis-mode PastChainIDs[0]+feat(hardfork): --patch-realmcontribs/tx-archive: hardfork-replay readiness (metadata, SignerInfo brute-force, progress log, gas-replay report)contribs/tx-archive: registerchainstdlib amino typesMeta PR content (this branch)
Once #5540 + #5533 + #5535 land, this PR becomes purely:
misc/hf-glue/— the test harness (Makefile, docker-compose with gnoland + gnoweb,scripts/migrate.shdeclarative DSL,scripts/lib/hf.shhelpers,scripts/{replay-log,report-replay,check-state}.shreports,fixvalidator/)misc/hf-glue/README.mdwith the DO-NOT-MERGE banner~1 kLOC of test harness that stays out of the default compiled surface.
What the testbed proves (against
rpc.gno.land, halt @ 704052)Running
make fetch && make init && make up:contribs/tx-archive)PastChainIDs, per-txSignerInfo(account_num + brute-forced sequence)gnoland-1node in Docker with 0 / 2715 tx failures on replayr/sys/*,r/gov/dao,r/gnoland/blog,r/gnoland/coins,r/gnoland/wugnotall ✅)account_num=3096261,sequence=31match production exactly — proof the SignerInfo brute-force landed correctlyr/sys/paramsgains #5368'shalt.gnovia--patch-realm, without any post-deploy re-addpkg danceOpen work (tracked in this PR, not in small PRs yet)
r/sys/validators/v2— consensus works offGenesisDoc.Validators(whichfixvalidatorrewrites to our single local key), but the realm still lists the original 7 gnoland1 validators. A post-history "migration tx" is the fix;scripts/lib/hf.shhas the hook and doc-comment,misc/hardforkdoesn't plumb it yet.misc/hardforkfeatures intocontribs/gnogenesis—gnogenesisgrows a workdir representation (dir of decoded sub-files) so every patch mutates one sub-file rather than re-marshalling the whole 192 MB genesis.contribs/tx-archivebecomes the only thing that talks to a chain (download / blockstore export);gnogenesisstays purely filesystem. Design notes in test(hf-glue): hardfork end-to-end testbed (DO NOT MERGE) — integrates #5511 + #5376 #5486 (comment).AI disclosure
Built with Claude Code. Reproduce with
cd misc/hf-glue && make fetch && make init && make up; reports inout/*.md.