Skip to content

security(consensus): atomic finalize_epoch settlement (anti double-reward) — REVIEW BEFORE MERGE#6748

Merged
Scottcjn merged 1 commit into
mainfrom
security/epoch-settlement-atomic
Jun 1, 2026
Merged

security(consensus): atomic finalize_epoch settlement (anti double-reward) — REVIEW BEFORE MERGE#6748
Scottcjn merged 1 commit into
mainfrom
security/epoch-settlement-atomic

Conversation

@Scottcjn
Copy link
Copy Markdown
Owner

@Scottcjn Scottcjn commented Jun 1, 2026

⚠️ Consensus change — please review before merge (do not auto-merge)

The bug (High → reward inflation)

finalize_epoch had a non-atomic replay guard: the SELECT settled ran in autocommit before the (deferred) transaction, balance credits were applied unconditionally, and the settled-flag UPDATE ignored its rowcount. Two concurrent calls could both credit an epoch's reward pot → supply inflated past the 8,388,608 cap. (Red-team confirmed.)

The fix

  • Atomic claim-then-credit: BEGIN IMMEDIATE, then INSERT-ensure + UPDATE epoch_state SET settled=1 WHERE settled=0 with rowcount enforced. Credits run only on a won claim; everything commits/rolls back together.
  • Schema: epoch_state gains settled/settled_ts + idempotent migration (the missing column silently disabled the guard on fresh DBs).
  • Upgrade backfill: epochs already rewarded via the epoch_rewards path are marked settled (insert-missing + update-existing) so they can't be re-credited post-upgrade.
  • Caller guard: now checks epoch_state.settled (was epoch_rewards, which finalize_epoch never writes → re-invoked every block).

Tests (7, all green)

Claim won-once-then-lost; real two-connection BEGIN IMMEDIATE contention; migration idempotency; both backfill cases (missing row + existing unsettled row).

🔬 Scope / known limitation (tri-brain, 3 review loops)

This closes the single-path (finalize_epoch-vs-itself) race only. There remain separate settlement writers (settle_epoch_rip200, anti_double_mining) and a divergent epoch_state CREATE in sophia_elya_service.py, with no shared claim/lock — a genuine cross-path settlement race + schema-unification effort that is bigger than a finalize_epoch patch and is tracked separately (see linked issue). Per the recurrent-depth dev loop, I halted here rather than scope-creep a consensus PR. (GPT-OSS/:8082 was down, so tri-brain ran degraded Codex+Grok throughout.)

🤖 Generated with Claude Code

…le-reward)

finalize_epoch had a non-atomic replay guard: the SELECT settled ran in
autocommit BEFORE the (deferred) transaction, balance credits were applied
unconditionally, and the settled-flag UPDATE ignored its rowcount — so two
concurrent calls could both credit an epoch's reward pot, inflating supply
past the 8,388,608 cap.

- finalize_epoch now uses BEGIN IMMEDIATE and CLAIMS the epoch first:
  INSERT-ensure + UPDATE epoch_state SET settled=1 WHERE settled=0, with the
  rowcount enforced. Balance credits only run on a won claim; everything
  commits/rolls back together.
- epoch_state schema gains settled/settled_ts + an idempotent migration
  (the missing column silently disabled the guard on fresh DBs).
- Upgrade backfill marks epochs already rewarded via the epoch_rewards path
  as settled (INSERT-missing + UPDATE-existing) so they can't be re-credited.
- Auto-settle caller now checks epoch_state.settled (it was checking
  epoch_rewards, which finalize_epoch never writes).
- 7 isolated tests: claim won-once-then-lost, real 2-connection BEGIN
  IMMEDIATE contention, migration idempotency, both backfill cases.

SCOPE / KNOWN-LIMITATION (tri-brain, 3 loops): this closes the
single-path (finalize_epoch-vs-itself) race only. There remain SEPARATE
settlement writers (settle_epoch_rip200, anti_double_mining) and a divergent
epoch_state CREATE in sophia_elya_service.py with NO shared claim/lock — a
cross-path settlement race + schema-unification effort tracked separately.
GPT-OSS/:8082 was down so tri-brain ran degraded (Codex+Grok).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) node Node server related tests Test suite changes size/L PR: 201-500 lines labels Jun 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

✅ BCOS v2 Scan Results

Metric Value
Trust Score 60/100
Certificate ID BCOS-1f4d2688
Tier L1 (met)

BCOS Badge

What does this mean?

The BCOS (Beacon Certified Open Source) engine scans for:

  • SPDX license header compliance
  • Known CVE vulnerabilities (OSV database)
  • Static analysis findings (Semgrep)
  • SBOM completeness
  • Dependency freshness
  • Test infrastructure evidence
  • Review attestation tier

Full report | What is BCOS?


BCOS v2 Engine - Free & Open Source (MIT) - Elyan Labs

Copy link
Copy Markdown
Contributor

@FakerHideInBush FakerHideInBush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 0fe4f4ce30e8513da2ef6fabbf48fffd588b3fb2 for the settlement replay fix.

Line-level observations:

  • node/rustchain_v2_integrated_v2.2.1_rip200.py:1425-1456: the epoch_state migration/backfill handles both legacy missing-column tables and already-rewarded epoch_rewards rows. I specifically like that the OperationalError catch only tolerates the absent optional epoch_rewards table and re-raises other migration errors, which avoids silently skipping a monetary backfill.
  • node/rustchain_v2_integrated_v2.2.1_rip200.py:3659-3677: moving the authoritative replay guard under BEGIN IMMEDIATE and enforcing the UPDATE ... WHERE settled = 0 rowcount closes the race that the old autocommit pre-check could not close. Losing claim paths roll back before any balance/UTXO credit loop runs.
  • node/rustchain_v2_integrated_v2.2.1_rip200.py:3726-3737: the UTXO reward batches still run inside the same transaction/connection after the claim, so an exception before commit should roll back the claim and the balance credits together.
  • node/rustchain_v2_integrated_v2.2.1_rip200.py:4961-4967: the caller now consults epoch_state.settled, matching what finalize_epoch actually writes. That fixes the previous mismatch where the caller looked at epoch_rewards even though this path never records there.
  • node/tests/test_epoch_settlement_atomic.py:105-134 covers both upgrade backfill cases that matter for inflation prevention: missing epoch_state rows and existing-but-unsettled rows for already rewarded epochs.
  • node/tests/test_epoch_settlement_atomic.py:137-164 uses two real SQLite connections and BEGIN IMMEDIATE, so it is not just a mock of the concurrency contract.

Validation I ran locally in a clean PR worktree:

  • python -m py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\tests\test_epoch_settlement_atomic.py -> passed
  • git diff --check origin/main...HEAD -- node/rustchain_v2_integrated_v2.2.1_rip200.py node/tests/test_epoch_settlement_atomic.py -> passed
  • python -m pytest node/tests/test_epoch_settlement_atomic.py -q -> 7 passed

Hosted CI note: the broad test job is red with the same unrelated baseline failures seen on neighboring PRs, but the new settlement test file passed in that run and the BCOS/security-adjacent checks are green.

Non-blocking follow-up: these tests intentionally duplicate the SQL contract instead of importing the full node module because of import side effects. That is reasonable for this PR, but a later harness around side-effect-free settlement helpers would reduce drift risk between the copied SQL and production code.

Verdict: approved. I do not see a blocking issue in this patch.

Disclosure: submitting this review for the RustChain code review bounty program (#73); no payment is asserted unless/until maintainers accept it.

Copy link
Copy Markdown
Contributor

@MolhamHamwi MolhamHamwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the epoch settlement atomicity changes and the migration/backfill coverage.

Two technical observations:

  1. finalize_epoch now wins the settlement claim with BEGIN IMMEDIATE plus UPDATE epoch_state ... WHERE settled = 0 before any balance credits are applied, and it checks rowcount before proceeding. That closes the previous replay window where two callers could both pass a pre-transaction guard and credit the same epoch.

  2. The upgrade backfill is important and correctly covers both legacy shapes: rewarded epochs with no epoch_state row are inserted as settled, and existing unsettled rows for epochs already present in epoch_rewards are updated. The regression tests exercise both cases and then assert a later claim loses, which directly protects against post-upgrade double settlement.

I received RTC compensation for this review.

@Scottcjn Scottcjn merged commit b150ae8 into main Jun 1, 2026
11 of 12 checks passed
@Scottcjn Scottcjn deleted the security/epoch-settlement-atomic branch June 1, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) node Node server related size/L PR: 201-500 lines tests Test suite changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants