fix: cap interested_peers per contract (#3798 Gap 2)#4596
Conversation
…on (#3798) The per-contract `interested_peers` map in InterestManager was inserted into with no length check, so a single peer flooding distinct identities could grow it unboundedly. Since this map drives UPDATE broadcast targeting, an unbounded map is an amplification vector: every state update fans out to N targets, multiplying memory, CPU, and executor cost by N. Cap the map at MAX_INTERESTED_PEERS_PER_CONTRACT (512), mirroring the existing MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT in hosting.rs so the two broadcast-target sources are symmetrically bounded. A NEW peer at capacity is rejected (return is_new = false) before the reverse-index / hash-index writes, so it leaves no zombie peer_contracts entry and does not trigger the #4359 first-viable-target broadcast flush. Renewals of an already-tracked peer are never rejected, so a legit at-capacity contract keeps serving its existing peers. Entries still TTL-expire via the existing sweep, so no new GC exemption is introduced. This is Gap 2 of tracking issue #3798 only; the other gaps remain open. Claude-Session: https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj
|
I have all the information I need to complete the review. Rule Review: Cap enforcement is clean; one minor test-coverage gapRules checked: git-workflow.md, code-style.md, testing.md, ring.md WarningsNone. Info
Detailed notes (non-finding observations):
Rule review against |
iduartgomez
left a comment
There was a problem hiding this comment.
Comprehensive PR Review: #4596
Summary
- PR Title: fix: cap interested_peers per contract (#3798 Gap 2)
- Type: fix (hardening)
- CI Status: passing (all checks green)
- Linked Issues: Refs #3798 (Gap 2 of a 6-gap tracking issue; correctly NOT closing it)
- Review tier: Full (concurrency / DashMap atomicity + amplification-authorization surface)
- Reviewers run: code-first, testing, skeptical, big-picture (4 Claude lenses). External pass (codex) unavailable — token expired; gemini not installed. Findings verified by hand against the worktree code, and the test was run independently by two reviewers.
Code-First Analysis
Independent Understanding: register_peer_interest adds a per-contract cap (MAX_INTERESTED_PEERS_PER_CONTRACT = 512). After taking the shard guard and computing is_new = !entry.contains_key(&peer), a reject branch fires when is_new && entry.len() >= 512: drop guard, warn!, return false — before the peer_contracts reverse-index and index_contract_hash writes.
Stated Intent: #3798 Gap 2 — bound an adversarial broadcast-amplification vector (an attacker registering interest under many distinct peer identities inflates UPDATE fan-out), with a cap "matching or lower than MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT (hosting.rs), reject or evict on overflow."
Alignment: Matches precisely. interested_peers is the sole interest-side source feeding get_broadcast_targets_update (Source 2, update.rs:324), so capping it directly bounds fan-out. The "symmetric with hosting.rs" claim is verified true: hosting.rs:85 is = 512 and hosting.rs:738 enforces the identical is_new && entry.len() >= MAX reject-on-overflow pattern.
Gaps: None material in the production code. The or_default() at the entry guard cannot leak an empty-contract zombie on the reject path, because the reject only fires when the entry is already at 512 (i.e. already exists) — load-bearing reasoning that the doc-comment leaves implicit.
Testing Assessment
Coverage Level: adequate for the cap; one latent non-determinism in the test (below).
| Test Type | Status | Notes |
|---|---|---|
| Unit | test_register_peer_interest_caps_at_max covers the cap, all 3 invariants, and the at-capacity boundary — but rests on unguaranteed RNG distinctness (see Should-fix) |
|
| Integration | N/A | unit is the correct level for a single bounded function (slice 1) |
| Simulation | N/A | — |
Regression Test: present and independently verified to fail without the cap — two reviewers disabled the cap and the test panicked at the "new peer at capacity must be rejected" assertion, then passed when restored.
Invariants covered: (a) no zombie reverse-index on reject (get_contracts_for_peer(overflow).is_empty()), (b) is_new=false on reject, (c) renewals of existing peers never rejected and still refresh the summary. All three asserted.
Missing: downstream fan-out reduction (get_broadcast_targets_update) not asserted — acceptable for slice 1; the cap bounds the source map by construction, and the symmetric hosting.rs cap has only unit coverage too.
Skeptical Findings
Risk Level: low (production logic), with one test-robustness issue.
| Concern | Severity | Location | Details |
|---|---|---|---|
| Test rests on unguaranteed RNG distinctness — one real failure observed | Should | interest.rs:1199 make_peer_key (verified) |
make_peer_key(_seed: u8) ignores its seed and calls TransportKeypair::new() (random). The test fills the map with make_peer_key(0) ×512 but never asserts the 512 keys are pairwise distinct, nor that overflow_peer differs from all of them. One reviewer observed a cold-build run fail at the 513th call (it compared equal to a tracked peer → is_new=false → cap branch never reached). GlobalRng is thread-local; if the cargo test threadpool reuses a thread that left a deterministic seed, distinctness is no longer guaranteed (matches the known "GlobalRng seed state bleeds across concurrent tests in one binary" hazard). Production logic is deterministic and correct regardless — this is purely test robustness. Fix: derive the keypair bytes from the _seed/a counter so keys are deterministic-and-distinct, or assert overflow_peer is not already present before expecting rejection. |
| Atomicity / zombie entries | — verified safe | interest.rs:411-419 |
Reject does drop(entry); return false before all three writes; rejected peer leaves no entry in interested_peers, peer_contracts, or contract_hash_index. Confirmed at runtime by the test. |
| Off-by-one | — verified safe | reject condition | >= 512 with insert-after caps at exactly 512 (511 passes → 512; 512 rejects). |
| No permanent lockout | — verified safe | sweep interest.rs:814 |
Slots free via INTEREST_TTL (20min) + 60s sweep. Time-bounded per AGENTS.md; an adversary must keep all 512 identities actively refreshing to hold the cap — exactly the cost the cap is meant to impose. |
No high/medium production bugs.
Big Picture Assessment
Goal Alignment: yes — closes the issue's highest-priority gap (High severity / Low effort) with a minimal, isolated change reusing a proven sibling pattern.
Anti-Patterns Detected: none — no #[ignore], no removed tests, no weakened tolerances, no silent fallback (explicit warn! + return false).
Removed Code Concerns: none — purely additive (+95 / −0, one file).
Scope Assessment: exemplary. Single file, single logical change, no Gap 3/4/5/6 work smuggled in. Notably does NOT touch the tempting-adjacent update.rs Gap 4. Uses Refs #3798 (not Fixes) so the T-tracking issue stays open. Deferred gaps named in the PR body.
Two benign, undocumented boundary side-effects worth a comment (Consider):
- Two-cap desync at
subscribe.rs:616-638: hosting'sadd_downstream_subscriber(512) and the new interest cap (512) are independently-swept maps; at the exact boundary they can transiently differ (hosting accepts, interest rejects → the #4359 flush, the local-interest counter bump, and interest propagation are skipped while the peer IS in hosting's map). Benign (peer just won't get broadcasts until its next heartbeat), self-corrects in steady state. - Rejected peer still gets a
Summariesresponse (node.rs:2442-2443): arguably correct (don't signal an adversary they were capped) but non-obvious.
Documentation
- Code docs: the reject doc-comment is accurate and thorough. Could note the
or_default()-can't-leak reasoning and the two-cap boundary relationship. - Module docs:
interest.rsheader doesn't mention the new 512 bound; a one-line addition would make the symmetric caps discoverable. (Consider.) - A one-line
## Fixessection stating "None — partial work on #3798; tracking issue stays open" would make the deliberate absence ofFixesread as intentional. (Consider.)
Recommendations
Must Fix (Blocking)
None. The production change is correct, atomic, time-bounded, and faithful to intent.
Should Fix (Important)
- Make the test's peer keys deterministic-and-distinct.
make_peer_keyignores its seed and randomizes; the test never asserts pairwise distinctness and a real one-off failure was observed (RNG collision / leaked thread-local seed). Derive the 32 keypair bytes from the_seed/a counter (matching the siblinghosting.rstest's deterministic-index approach), OR assertoverflow_peeris absent before expecting rejection. Prevents a latent flaky failure.
Consider (Suggestions)
- Rename/use the
make_peer_keyargument (it's ignored) — passing literal0in the fill loop reads like "same key 512×", the exact vacuous-test failure mode a reviewer fears. - One-line comment at
subscribe.rs:616-638acknowledging the two symmetric 512 caps are independently swept and can momentarily desync at the boundary. - Add a
## Fixesnote + a module-doc line for the new cap. - (Optional next boundary case) a test that frees a slot below cap and confirms a new peer is then accepted — proves the cap is a live bound, not a one-way latch. Sibling hosting.rs omits this too, so consistent to defer.
Verdict
State: Needs Changes — Light Re-Check Sufficient
HEAD SHA reviewed: d750cba
The production change is correct, minimal, well-scoped, and a model example of slicing a tracking issue — no blocking issues. The one Should-fix is test robustness: the regression test leans on RNG distinctness it doesn't assert, and a real one-off failure was observed; making the keys deterministic-and-distinct is a small, contained change. Because all findings are test-only and the production diff is unchanged, a diff-of-the-diff re-check on the test fix is sufficient.
[AI-assisted - Claude]
…3798) The cap regression test `test_register_peer_interest_caps_at_max` relied on RNG distinctness it never asserted. `make_peer_key` ignored its seed and returned a random `TransportKeypair::new()`, so the 512-fill loop depended on 512 random keypairs never colliding AND on no leaked thread-local GlobalRng seed making them deterministic-equal. A reviewer observed a real one-off cold-build failure: the 513th key collided with a tracked peer, so registration returned is_new = false and the cap branch was never exercised. Make the test keys deterministic-AND-distinct. `make_peer_key` now derives its bytes from the seed via `TransportPublicKey::from_bytes` (matching the sibling hosting.rs helper), and a new `make_unique_peer_key` takes a u32 seed so the fill loop can produce 512+ pairwise-distinct peers regardless of RNG state. The overflow peer uses a seed past the fill range so it is guaranteed untracked. This removes the flake entirely while keeping the test a real regression guard (verified it still fails when the cap is disabled). Claude-Session: https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj
Problem
InterestManager::register_peer_interest(crates/core/src/ring/interest.rs)inserts into the per-contract
interested_peers: DashMap<ContractKey, HashMap<PeerKey, PeerInterest>>with no length check. Because this map drivesUPDATE broadcast targeting, an attacker who registers interest under many
distinct peer identities for one contract can grow the map unboundedly and turn
every state update into an N-way fan-out — an amplification vector that
multiplies memory, CPU (WASM execution), and executor cost by N.
This is Gap 2 of tracking issue #3798.
Solution
Mirror the proven cap already used by the sibling broadcast-target source
HostingManager::add_downstream_subscriber(MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT = 512inhosting.rs):MAX_INTERESTED_PEERS_PER_CONTRACT = 512.register_peer_interest, after taking theinterested_peersshard guardand computing
is_new, but before thepeer_contractsreverse-index andindex_contract_hashwrites: ifis_new && entry.len() >= MAX→ drop theguard,
warn!, andreturn false.Key invariants preserved:
reverse-index/hash writes, so a rejected peer never lands in
peer_contractsor
contract_hash_index. (The three cross-shard writes stay atomic vsremove_peer_interest— PRs InterestManager::remove_all_peer_interests has a bidirectional consistency race against register_peer_interest #4174/feat: static-analysis aggregator + perf cleanups it surfaced #4129.)is_new = falseon reject. A rejected peer is not a new viable target,so it cannot trigger the PUT of a fresh contract id gives up broadcast before subscriptions resolve → state lands locally-only, remote nodes get NotFound (and fdev reports success) #4359 first-viable-target pending-broadcast flush.
already-tracked peer always refreshes, so a legit at-capacity contract keeps
serving its existing peers.
No new GC exemption is introduced — entries still TTL-expire via the existing
INTEREST_TTL+ sweep — satisfying the AGENTS.md "exemptions must betime-bounded" rule. Reject-on-overflow is chosen over LRU eviction (simpler,
matches
hosting.rs).Testing
New regression test
test_register_peer_interest_caps_at_maxcovers all threeinvariants:
true, map length == MAX).false, length unchanged).peer_contractsreverse index (no zombie).false(notnew) yet still updates the entry (renewal not locked out by capacity).
Verified the test fails without the cap (temporarily disabling the reject
block panics at the rejection assertion), then passes with it restored.
Commands run (all green):
cargo fmtcargo clippy -p freenet --lib -- -D warnings— no issuescargo test -p freenet --lib ring::interest— 36 passedcargo test -p freenet --lib p2p_protoc— 49 passed (call-site-pairing guard intact)Scope
This is slice 1 of N for #3798 and addresses Gap 2 only. #3798 is a
tracking issue and must stay open; the remaining gaps (3, 4, 5, 6) are deferred
to follow-up PRs.
Refs #3798
https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj