security(consensus): atomic finalize_epoch settlement (anti double-reward) — REVIEW BEFORE MERGE#6748
Conversation
…le-reward) finalize_epoch had a non-atomic replay guard: the SELECT settled ran in autocommit BEFORE the (deferred) transaction, balance credits were applied unconditionally, and the settled-flag UPDATE ignored its rowcount — so two concurrent calls could both credit an epoch's reward pot, inflating supply past the 8,388,608 cap. - finalize_epoch now uses BEGIN IMMEDIATE and CLAIMS the epoch first: INSERT-ensure + UPDATE epoch_state SET settled=1 WHERE settled=0, with the rowcount enforced. Balance credits only run on a won claim; everything commits/rolls back together. - epoch_state schema gains settled/settled_ts + an idempotent migration (the missing column silently disabled the guard on fresh DBs). - Upgrade backfill marks epochs already rewarded via the epoch_rewards path as settled (INSERT-missing + UPDATE-existing) so they can't be re-credited. - Auto-settle caller now checks epoch_state.settled (it was checking epoch_rewards, which finalize_epoch never writes). - 7 isolated tests: claim won-once-then-lost, real 2-connection BEGIN IMMEDIATE contention, migration idempotency, both backfill cases. SCOPE / KNOWN-LIMITATION (tri-brain, 3 loops): this closes the single-path (finalize_epoch-vs-itself) race only. There remain SEPARATE settlement writers (settle_epoch_rip200, anti_double_mining) and a divergent epoch_state CREATE in sophia_elya_service.py with NO shared claim/lock — a cross-path settlement race + schema-unification effort tracked separately. GPT-OSS/:8082 was down so tri-brain ran degraded (Codex+Grok). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ BCOS v2 Scan Results
What does this mean?The BCOS (Beacon Certified Open Source) engine scans for:
BCOS v2 Engine - Free & Open Source (MIT) - Elyan Labs |
FakerHideInBush
left a comment
There was a problem hiding this comment.
Reviewed current head 0fe4f4ce30e8513da2ef6fabbf48fffd588b3fb2 for the settlement replay fix.
Line-level observations:
node/rustchain_v2_integrated_v2.2.1_rip200.py:1425-1456: theepoch_statemigration/backfill handles both legacy missing-column tables and already-rewardedepoch_rewardsrows. I specifically like that theOperationalErrorcatch only tolerates the absent optionalepoch_rewardstable and re-raises other migration errors, which avoids silently skipping a monetary backfill.node/rustchain_v2_integrated_v2.2.1_rip200.py:3659-3677: moving the authoritative replay guard underBEGIN IMMEDIATEand enforcing theUPDATE ... WHERE settled = 0rowcount closes the race that the old autocommit pre-check could not close. Losing claim paths roll back before any balance/UTXO credit loop runs.node/rustchain_v2_integrated_v2.2.1_rip200.py:3726-3737: the UTXO reward batches still run inside the same transaction/connection after the claim, so an exception before commit should roll back the claim and the balance credits together.node/rustchain_v2_integrated_v2.2.1_rip200.py:4961-4967: the caller now consultsepoch_state.settled, matching whatfinalize_epochactually writes. That fixes the previous mismatch where the caller looked atepoch_rewardseven though this path never records there.node/tests/test_epoch_settlement_atomic.py:105-134covers both upgrade backfill cases that matter for inflation prevention: missingepoch_staterows and existing-but-unsettled rows for already rewarded epochs.node/tests/test_epoch_settlement_atomic.py:137-164uses two real SQLite connections andBEGIN IMMEDIATE, so it is not just a mock of the concurrency contract.
Validation I ran locally in a clean PR worktree:
python -m py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\tests\test_epoch_settlement_atomic.py-> passedgit diff --check origin/main...HEAD -- node/rustchain_v2_integrated_v2.2.1_rip200.py node/tests/test_epoch_settlement_atomic.py-> passedpython -m pytest node/tests/test_epoch_settlement_atomic.py -q-> 7 passed
Hosted CI note: the broad test job is red with the same unrelated baseline failures seen on neighboring PRs, but the new settlement test file passed in that run and the BCOS/security-adjacent checks are green.
Non-blocking follow-up: these tests intentionally duplicate the SQL contract instead of importing the full node module because of import side effects. That is reasonable for this PR, but a later harness around side-effect-free settlement helpers would reduce drift risk between the copied SQL and production code.
Verdict: approved. I do not see a blocking issue in this patch.
Disclosure: submitting this review for the RustChain code review bounty program (#73); no payment is asserted unless/until maintainers accept it.
MolhamHamwi
left a comment
There was a problem hiding this comment.
I reviewed the epoch settlement atomicity changes and the migration/backfill coverage.
Two technical observations:
-
finalize_epochnow wins the settlement claim withBEGIN IMMEDIATEplusUPDATE epoch_state ... WHERE settled = 0before any balance credits are applied, and it checksrowcountbefore proceeding. That closes the previous replay window where two callers could both pass a pre-transaction guard and credit the same epoch. -
The upgrade backfill is important and correctly covers both legacy shapes: rewarded epochs with no
epoch_staterow are inserted as settled, and existing unsettled rows for epochs already present inepoch_rewardsare updated. The regression tests exercise both cases and then assert a later claim loses, which directly protects against post-upgrade double settlement.
I received RTC compensation for this review.
The bug (High → reward inflation)
finalize_epochhad a non-atomic replay guard: theSELECT settledran in autocommit before the (deferred) transaction, balance credits were applied unconditionally, and the settled-flagUPDATEignored itsrowcount. Two concurrent calls could both credit an epoch's reward pot → supply inflated past the 8,388,608 cap. (Red-team confirmed.)The fix
BEGIN IMMEDIATE, thenINSERT-ensure +UPDATE epoch_state SET settled=1 WHERE settled=0withrowcountenforced. Credits run only on a won claim; everything commits/rolls back together.epoch_stategainssettled/settled_ts+ idempotent migration (the missing column silently disabled the guard on fresh DBs).epoch_rewardspath are marked settled (insert-missing + update-existing) so they can't be re-credited post-upgrade.epoch_state.settled(wasepoch_rewards, whichfinalize_epochnever writes → re-invoked every block).Tests (7, all green)
Claim won-once-then-lost; real two-connection
BEGIN IMMEDIATEcontention; migration idempotency; both backfill cases (missing row + existing unsettled row).🔬 Scope / known limitation (tri-brain, 3 review loops)
This closes the single-path (
finalize_epoch-vs-itself) race only. There remain separate settlement writers (settle_epoch_rip200,anti_double_mining) and a divergentepoch_stateCREATE insophia_elya_service.py, with no shared claim/lock — a genuine cross-path settlement race + schema-unification effort that is bigger than afinalize_epochpatch and is tracked separately (see linked issue). Per the recurrent-depth dev loop, I halted here rather than scope-creep a consensus PR. (GPT-OSS/:8082was down, so tri-brain ran degraded Codex+Grok throughout.)🤖 Generated with Claude Code