fix: cap interested_peers per contract (#3798 Gap 2) by iduartgomez · Pull Request #4596 · freenet/freenet-core

iduartgomez · 2026-06-27T10:17:25Z

Problem

InterestManager::register_peer_interest (crates/core/src/ring/interest.rs)
inserts into the per-contract interested_peers: DashMap<ContractKey, HashMap<PeerKey, PeerInterest>> with no length check. Because this map drives
UPDATE broadcast targeting, an attacker who registers interest under many
distinct peer identities for one contract can grow the map unboundedly and turn
every state update into an N-way fan-out — an amplification vector that
multiplies memory, CPU (WASM execution), and executor cost by N.

This is Gap 2 of tracking issue #3798.

Solution

Mirror the proven cap already used by the sibling broadcast-target source
HostingManager::add_downstream_subscriber (MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT = 512 in hosting.rs):

New constant MAX_INTERESTED_PEERS_PER_CONTRACT = 512.
In register_peer_interest, after taking the interested_peers shard guard
and computing is_new, but before the peer_contracts reverse-index and
index_contract_hash writes: if is_new && entry.len() >= MAX → drop the
guard, warn!, and return false.

Key invariants preserved:

No zombie reverse-index entry. The reject happens before the
reverse-index/hash writes, so a rejected peer never lands in peer_contracts
or contract_hash_index. (The three cross-shard writes stay atomic vs
remove_peer_interest — PRs InterestManager::remove_all_peer_interests has a bidirectional consistency race against register_peer_interest #4174/feat: static-analysis aggregator + perf cleanups it surfaced #4129.)
is_new = false on reject. A rejected peer is not a new viable target,
so it cannot trigger the PUT of a fresh contract id gives up broadcast before subscriptions resolve → state lands locally-only, remote nodes get NotFound (and fdev reports success) #4359 first-viable-target pending-broadcast flush.
Renewals are never rejected. The cap only gates a NEW peer; an
already-tracked peer always refreshes, so a legit at-capacity contract keeps
serving its existing peers.

No new GC exemption is introduced — entries still TTL-expire via the existing
INTEREST_TTL + sweep — satisfying the AGENTS.md "exemptions must be
time-bounded" rule. Reject-on-overflow is chosen over LRU eviction (simpler,
matches hosting.rs).

Testing

New regression test test_register_peer_interest_caps_at_max covers all three
invariants:

Registers MAX distinct peers (each returns true, map length == MAX).
One more distinct peer is rejected (returns false, length unchanged).
Rejected peer is absent from the peer_contracts reverse index (no zombie).
Re-registering an existing peer with an updated summary returns false (not
new) yet still updates the entry (renewal not locked out by capacity).

Verified the test fails without the cap (temporarily disabling the reject
block panics at the rejection assertion), then passes with it restored.

Commands run (all green):

cargo fmt
cargo clippy -p freenet --lib -- -D warnings — no issues
cargo test -p freenet --lib ring::interest — 36 passed
cargo test -p freenet --lib p2p_protoc — 49 passed (call-site-pairing guard intact)

Scope

This is slice 1 of N for #3798 and addresses Gap 2 only. #3798 is a
tracking issue and must stay open; the remaining gaps (3, 4, 5, 6) are deferred
to follow-up PRs.

Refs #3798

https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj

…on (#3798) The per-contract `interested_peers` map in InterestManager was inserted into with no length check, so a single peer flooding distinct identities could grow it unboundedly. Since this map drives UPDATE broadcast targeting, an unbounded map is an amplification vector: every state update fans out to N targets, multiplying memory, CPU, and executor cost by N. Cap the map at MAX_INTERESTED_PEERS_PER_CONTRACT (512), mirroring the existing MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT in hosting.rs so the two broadcast-target sources are symmetrically bounded. A NEW peer at capacity is rejected (return is_new = false) before the reverse-index / hash-index writes, so it leaves no zombie peer_contracts entry and does not trigger the #4359 first-viable-target broadcast flush. Renewals of an already-tracked peer are never rejected, so a legit at-capacity contract keeps serving its existing peers. Entries still TTL-expire via the existing sweep, so no new GC exemption is introduced. This is Gap 2 of tracking issue #3798 only; the other gaps remain open. Claude-Session: https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj

github-actions · 2026-06-27T10:20:57Z

I have all the information I need to complete the review.

Rule Review: Cap enforcement is clean; one minor test-coverage gap

Rules checked: git-workflow.md, code-style.md, testing.md, ring.md
Files reviewed: 1 (crates/core/src/ring/interest.rs)

Warnings

None.

Info

crates/core/src/ring/interest.rs:1272 — The test test_register_peer_interest_caps_at_max doesn't cover the freed-slot state transition: remove one peer from a full contract, then verify the next new registration is accepted. testing.md lists "State transitions (invalid states, repeated calls, out-of-order)" as a required edge case class. The primary cap invariant (rejection at capacity, no zombie entries, renewals passing) is well covered; this is a secondary scenario made straightforward by HashMap semantics, but worth a brief sub-case given the security sensitivity of this path. (rule: testing.md — edge cases)

Detailed notes (non-finding observations):

Cross-DashMap lock discipline (ring.md critical invariant): The rejection branch drops entry before the early return false, which is correct here — no secondary writes (peer_contracts, index_contract_hash) occur in the rejection path. The successful path still holds the guard across all secondary writes. ✓
Numeric constant: MAX_INTERESTED_PEERS_PER_CONTRACT = 512 is a security cap unrelated to configurable values; it is defined as a named constant with a comment tying it to the symmetric MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT in hosting.rs. Satisfies the code-style.md rule ("NO: Define as a named constant with a comment explaining the choice"). ✓
make_unique_peer_key in tests: Uses deterministic TransportPublicKey::from_bytes([seed bytes…]) — no RNG, no GlobalRng bypassed. The comment accurately explains why this avoids the flake that the previous TransportKeypair::new() (OsRng) could cause in the 513-registration loop. ✓
fix: PR has regression test: test_register_peer_interest_caps_at_max is present. ✓

Rule review against .claude/rules/. WARNING findings block merge.

iduartgomez

Comprehensive PR Review: #4596

Summary

PR Title: fix: cap interested_peers per contract (#3798 Gap 2)
Type: fix (hardening)
CI Status: passing (all checks green)
Linked Issues: Refs #3798 (Gap 2 of a 6-gap tracking issue; correctly NOT closing it)
Review tier: Full (concurrency / DashMap atomicity + amplification-authorization surface)
Reviewers run: code-first, testing, skeptical, big-picture (4 Claude lenses). External pass (codex) unavailable — token expired; gemini not installed. Findings verified by hand against the worktree code, and the test was run independently by two reviewers.

Code-First Analysis

Independent Understanding: register_peer_interest adds a per-contract cap (MAX_INTERESTED_PEERS_PER_CONTRACT = 512). After taking the shard guard and computing is_new = !entry.contains_key(&peer), a reject branch fires when is_new && entry.len() >= 512: drop guard, warn!, return false — before the peer_contracts reverse-index and index_contract_hash writes.

Stated Intent: #3798 Gap 2 — bound an adversarial broadcast-amplification vector (an attacker registering interest under many distinct peer identities inflates UPDATE fan-out), with a cap "matching or lower than MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT (hosting.rs), reject or evict on overflow."

Alignment: Matches precisely. interested_peers is the sole interest-side source feeding get_broadcast_targets_update (Source 2, update.rs:324), so capping it directly bounds fan-out. The "symmetric with hosting.rs" claim is verified true: hosting.rs:85 is = 512 and hosting.rs:738 enforces the identical is_new && entry.len() >= MAX reject-on-overflow pattern.

Gaps: None material in the production code. The or_default() at the entry guard cannot leak an empty-contract zombie on the reject path, because the reject only fires when the entry is already at 512 (i.e. already exists) — load-bearing reasoning that the doc-comment leaves implicit.

Testing Assessment

Coverage Level: adequate for the cap; one latent non-determinism in the test (below).

Test Type	Status	Notes
Unit	⚠️	`test_register_peer_interest_caps_at_max` covers the cap, all 3 invariants, and the at-capacity boundary — but rests on unguaranteed RNG distinctness (see Should-fix)
Integration	N/A	unit is the correct level for a single bounded function (slice 1)
Simulation	N/A	—

Regression Test: present and independently verified to fail without the cap — two reviewers disabled the cap and the test panicked at the "new peer at capacity must be rejected" assertion, then passed when restored.

Invariants covered: (a) no zombie reverse-index on reject (get_contracts_for_peer(overflow).is_empty()), (b) is_new=false on reject, (c) renewals of existing peers never rejected and still refresh the summary. All three asserted.

Missing: downstream fan-out reduction (get_broadcast_targets_update) not asserted — acceptable for slice 1; the cap bounds the source map by construction, and the symmetric hosting.rs cap has only unit coverage too.

Skeptical Findings

Risk Level: low (production logic), with one test-robustness issue.

Concern	Severity	Location	Details
Test rests on unguaranteed RNG distinctness — one real failure observed	Should	`interest.rs:1199` `make_peer_key` (verified)	`make_peer_key(_seed: u8)` ignores its seed and calls `TransportKeypair::new()` (random). The test fills the map with `make_peer_key(0)` ×512 but never asserts the 512 keys are pairwise distinct, nor that `overflow_peer` differs from all of them. One reviewer observed a cold-build run fail at the 513th call (it compared equal to a tracked peer → `is_new=false` → cap branch never reached). `GlobalRng` is thread-local; if the cargo test threadpool reuses a thread that left a deterministic seed, distinctness is no longer guaranteed (matches the known "GlobalRng seed state bleeds across concurrent tests in one binary" hazard). Production logic is deterministic and correct regardless — this is purely test robustness. Fix: derive the keypair bytes from the `_seed`/a counter so keys are deterministic-and-distinct, or assert `overflow_peer` is not already present before expecting rejection.
Atomicity / zombie entries	— verified safe	`interest.rs:411-419`	Reject does `drop(entry); return false` before all three writes; rejected peer leaves no entry in `interested_peers`, `peer_contracts`, or `contract_hash_index`. Confirmed at runtime by the test.
Off-by-one	— verified safe	reject condition	`>= 512` with insert-after caps at exactly 512 (511 passes → 512; 512 rejects).
No permanent lockout	— verified safe	sweep `interest.rs:814`	Slots free via `INTEREST_TTL` (20min) + 60s sweep. Time-bounded per AGENTS.md; an adversary must keep all 512 identities actively refreshing to hold the cap — exactly the cost the cap is meant to impose.

No high/medium production bugs.

Big Picture Assessment

Goal Alignment: yes — closes the issue's highest-priority gap (High severity / Low effort) with a minimal, isolated change reusing a proven sibling pattern.
Anti-Patterns Detected: none — no #[ignore], no removed tests, no weakened tolerances, no silent fallback (explicit warn! + return false).
Removed Code Concerns: none — purely additive (+95 / −0, one file).
Scope Assessment: exemplary. Single file, single logical change, no Gap 3/4/5/6 work smuggled in. Notably does NOT touch the tempting-adjacent update.rs Gap 4. Uses Refs #3798 (not Fixes) so the T-tracking issue stays open. Deferred gaps named in the PR body.

Two benign, undocumented boundary side-effects worth a comment (Consider):

Two-cap desync at subscribe.rs:616-638: hosting's add_downstream_subscriber (512) and the new interest cap (512) are independently-swept maps; at the exact boundary they can transiently differ (hosting accepts, interest rejects → the #4359 flush, the local-interest counter bump, and interest propagation are skipped while the peer IS in hosting's map). Benign (peer just won't get broadcasts until its next heartbeat), self-corrects in steady state.
Rejected peer still gets a Summaries response (node.rs:2442-2443): arguably correct (don't signal an adversary they were capped) but non-obvious.

Documentation

Code docs: the reject doc-comment is accurate and thorough. Could note the or_default()-can't-leak reasoning and the two-cap boundary relationship.
Module docs: interest.rs header doesn't mention the new 512 bound; a one-line addition would make the symmetric caps discoverable. (Consider.)
A one-line ## Fixes section stating "None — partial work on #3798; tracking issue stays open" would make the deliberate absence of Fixes read as intentional. (Consider.)

Recommendations

Must Fix (Blocking)

None. The production change is correct, atomic, time-bounded, and faithful to intent.

Should Fix (Important)

Make the test's peer keys deterministic-and-distinct. make_peer_key ignores its seed and randomizes; the test never asserts pairwise distinctness and a real one-off failure was observed (RNG collision / leaked thread-local seed). Derive the 32 keypair bytes from the _seed/a counter (matching the sibling hosting.rs test's deterministic-index approach), OR assert overflow_peer is absent before expecting rejection. Prevents a latent flaky failure.

Consider (Suggestions)

Rename/use the make_peer_key argument (it's ignored) — passing literal 0 in the fill loop reads like "same key 512×", the exact vacuous-test failure mode a reviewer fears.
One-line comment at subscribe.rs:616-638 acknowledging the two symmetric 512 caps are independently swept and can momentarily desync at the boundary.
Add a ## Fixes note + a module-doc line for the new cap.
(Optional next boundary case) a test that frees a slot below cap and confirms a new peer is then accepted — proves the cap is a live bound, not a one-way latch. Sibling hosting.rs omits this too, so consistent to defer.

Verdict

State: Needs Changes — Light Re-Check Sufficient
HEAD SHA reviewed: d750cba

The production change is correct, minimal, well-scoped, and a model example of slicing a tracking issue — no blocking issues. The one Should-fix is test robustness: the regression test leans on RNG distinctness it doesn't assert, and a real one-off failure was observed; making the keys deterministic-and-distinct is a small, contained change. Because all findings are test-only and the production diff is unchanged, a diff-of-the-diff re-check on the test fix is sufficient.

[AI-assisted - Claude]

…3798) The cap regression test `test_register_peer_interest_caps_at_max` relied on RNG distinctness it never asserted. `make_peer_key` ignored its seed and returned a random `TransportKeypair::new()`, so the 512-fill loop depended on 512 random keypairs never colliding AND on no leaked thread-local GlobalRng seed making them deterministic-equal. A reviewer observed a real one-off cold-build failure: the 513th key collided with a tracked peer, so registration returned is_new = false and the cap branch was never exercised. Make the test keys deterministic-AND-distinct. `make_peer_key` now derives its bytes from the seed via `TransportPublicKey::from_bytes` (matching the sibling hosting.rs helper), and a new `make_unique_peer_key` takes a u32 seed so the fill loop can produce 512+ pairwise-distinct peers regardless of RNG state. The overflow peer uses a seed past the fill range so it is guaranteed untracked. This removes the flake entirely while keeping the test a real regression guard (verified it still fails when the cap is disabled). Claude-Session: https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj

iduartgomez commented Jun 27, 2026

View reviewed changes

iduartgomez marked this pull request as draft June 27, 2026 14:27

sanity mentioned this pull request Jun 27, 2026

fix: bootstrap acceptance collapse — saturated gateways + terminus rejections + location backoff stall ring formation for ~5 minutes in 100-node sim #4362

Closed

iduartgomez mentioned this pull request Jun 30, 2026

perf(ring): cap interested-peers per contract to bound broadcast fan-out (#3798 Gap 2) #4520

Closed

iduartgomez marked this pull request as ready for review June 30, 2026 08:33

iduartgomez enabled auto-merge June 30, 2026 08:33

iduartgomez added this pull request to the merge queue Jun 30, 2026

Merged via the queue into main with commit 87472f2 Jun 30, 2026
17 checks passed

iduartgomez deleted the worktree-agent-a567f6891c2a52251 branch June 30, 2026 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: cap interested_peers per contract (#3798 Gap 2)#4596

fix: cap interested_peers per contract (#3798 Gap 2)#4596
iduartgomez merged 2 commits into
mainfrom
worktree-agent-a567f6891c2a52251

iduartgomez commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

iduartgomez left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

iduartgomez commented Jun 27, 2026

Problem

Solution

Testing

Scope

Uh oh!

github-actions Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rule Review: Cap enforcement is clean; one minor test-coverage gap

Warnings

Info

Uh oh!

iduartgomez left a comment

Choose a reason for hiding this comment

Comprehensive PR Review: #4596

Summary

Code-First Analysis

Testing Assessment

Skeptical Findings

Big Picture Assessment

Documentation

Recommendations

Must Fix (Blocking)

Should Fix (Important)

Consider (Suggestions)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 27, 2026 •

edited

Loading