Skip to content

fix: cap interested_peers per contract (#3798 Gap 2)#4596

Merged
iduartgomez merged 2 commits into
mainfrom
worktree-agent-a567f6891c2a52251
Jun 30, 2026
Merged

fix: cap interested_peers per contract (#3798 Gap 2)#4596
iduartgomez merged 2 commits into
mainfrom
worktree-agent-a567f6891c2a52251

Conversation

@iduartgomez

Copy link
Copy Markdown
Collaborator

Problem

InterestManager::register_peer_interest (crates/core/src/ring/interest.rs)
inserts into the per-contract interested_peers: DashMap<ContractKey, HashMap<PeerKey, PeerInterest>> with no length check. Because this map drives
UPDATE broadcast targeting, an attacker who registers interest under many
distinct peer identities for one contract can grow the map unboundedly and turn
every state update into an N-way fan-out — an amplification vector that
multiplies memory, CPU (WASM execution), and executor cost by N.

This is Gap 2 of tracking issue #3798.

Solution

Mirror the proven cap already used by the sibling broadcast-target source
HostingManager::add_downstream_subscriber (MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT = 512 in hosting.rs):

  • New constant MAX_INTERESTED_PEERS_PER_CONTRACT = 512.
  • In register_peer_interest, after taking the interested_peers shard guard
    and computing is_new, but before the peer_contracts reverse-index and
    index_contract_hash writes: if is_new && entry.len() >= MAX → drop the
    guard, warn!, and return false.

Key invariants preserved:

  1. No zombie reverse-index entry. The reject happens before the
    reverse-index/hash writes, so a rejected peer never lands in peer_contracts
    or contract_hash_index. (The three cross-shard writes stay atomic vs
    remove_peer_interest — PRs InterestManager::remove_all_peer_interests has a bidirectional consistency race against register_peer_interest #4174/feat: static-analysis aggregator + perf cleanups it surfaced #4129.)
  2. is_new = false on reject. A rejected peer is not a new viable target,
    so it cannot trigger the PUT of a fresh contract id gives up broadcast before subscriptions resolve → state lands locally-only, remote nodes get NotFound (and fdev reports success) #4359 first-viable-target pending-broadcast flush.
  3. Renewals are never rejected. The cap only gates a NEW peer; an
    already-tracked peer always refreshes, so a legit at-capacity contract keeps
    serving its existing peers.

No new GC exemption is introduced — entries still TTL-expire via the existing
INTEREST_TTL + sweep — satisfying the AGENTS.md "exemptions must be
time-bounded" rule. Reject-on-overflow is chosen over LRU eviction (simpler,
matches hosting.rs).

Testing

New regression test test_register_peer_interest_caps_at_max covers all three
invariants:

  • Registers MAX distinct peers (each returns true, map length == MAX).
  • One more distinct peer is rejected (returns false, length unchanged).
  • Rejected peer is absent from the peer_contracts reverse index (no zombie).
  • Re-registering an existing peer with an updated summary returns false (not
    new) yet still updates the entry (renewal not locked out by capacity).

Verified the test fails without the cap (temporarily disabling the reject
block panics at the rejection assertion), then passes with it restored.

Commands run (all green):

  • cargo fmt
  • cargo clippy -p freenet --lib -- -D warnings — no issues
  • cargo test -p freenet --lib ring::interest — 36 passed
  • cargo test -p freenet --lib p2p_protoc — 49 passed (call-site-pairing guard intact)

Scope

This is slice 1 of N for #3798 and addresses Gap 2 only. #3798 is a
tracking issue and must stay open; the remaining gaps (3, 4, 5, 6) are deferred
to follow-up PRs.

Refs #3798

https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj

…on (#3798)

The per-contract `interested_peers` map in InterestManager was inserted
into with no length check, so a single peer flooding distinct identities
could grow it unboundedly. Since this map drives UPDATE broadcast
targeting, an unbounded map is an amplification vector: every state
update fans out to N targets, multiplying memory, CPU, and executor cost
by N.

Cap the map at MAX_INTERESTED_PEERS_PER_CONTRACT (512), mirroring the
existing MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT in hosting.rs so the two
broadcast-target sources are symmetrically bounded. A NEW peer at
capacity is rejected (return is_new = false) before the reverse-index /
hash-index writes, so it leaves no zombie peer_contracts entry and does
not trigger the #4359 first-viable-target broadcast flush. Renewals of an
already-tracked peer are never rejected, so a legit at-capacity contract
keeps serving its existing peers. Entries still TTL-expire via the
existing sweep, so no new GC exemption is introduced.

This is Gap 2 of tracking issue #3798 only; the other gaps remain open.

Claude-Session: https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj
@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

I have all the information I need to complete the review.

Rule Review: Cap enforcement is clean; one minor test-coverage gap

Rules checked: git-workflow.md, code-style.md, testing.md, ring.md
Files reviewed: 1 (crates/core/src/ring/interest.rs)

Warnings

None.

Info

  • crates/core/src/ring/interest.rs:1272 — The test test_register_peer_interest_caps_at_max doesn't cover the freed-slot state transition: remove one peer from a full contract, then verify the next new registration is accepted. testing.md lists "State transitions (invalid states, repeated calls, out-of-order)" as a required edge case class. The primary cap invariant (rejection at capacity, no zombie entries, renewals passing) is well covered; this is a secondary scenario made straightforward by HashMap semantics, but worth a brief sub-case given the security sensitivity of this path. (rule: testing.md — edge cases)

Detailed notes (non-finding observations):

  • Cross-DashMap lock discipline (ring.md critical invariant): The rejection branch drops entry before the early return false, which is correct here — no secondary writes (peer_contracts, index_contract_hash) occur in the rejection path. The successful path still holds the guard across all secondary writes. ✓
  • Numeric constant: MAX_INTERESTED_PEERS_PER_CONTRACT = 512 is a security cap unrelated to configurable values; it is defined as a named constant with a comment tying it to the symmetric MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT in hosting.rs. Satisfies the code-style.md rule ("NO: Define as a named constant with a comment explaining the choice"). ✓
  • make_unique_peer_key in tests: Uses deterministic TransportPublicKey::from_bytes([seed bytes…]) — no RNG, no GlobalRng bypassed. The comment accurately explains why this avoids the flake that the previous TransportKeypair::new() (OsRng) could cause in the 513-registration loop. ✓
  • fix: PR has regression test: test_register_peer_interest_caps_at_max is present. ✓

Rule review against .claude/rules/. WARNING findings block merge.

@iduartgomez iduartgomez left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive PR Review: #4596

Summary

  • PR Title: fix: cap interested_peers per contract (#3798 Gap 2)
  • Type: fix (hardening)
  • CI Status: passing (all checks green)
  • Linked Issues: Refs #3798 (Gap 2 of a 6-gap tracking issue; correctly NOT closing it)
  • Review tier: Full (concurrency / DashMap atomicity + amplification-authorization surface)
  • Reviewers run: code-first, testing, skeptical, big-picture (4 Claude lenses). External pass (codex) unavailable — token expired; gemini not installed. Findings verified by hand against the worktree code, and the test was run independently by two reviewers.

Code-First Analysis

Independent Understanding: register_peer_interest adds a per-contract cap (MAX_INTERESTED_PEERS_PER_CONTRACT = 512). After taking the shard guard and computing is_new = !entry.contains_key(&peer), a reject branch fires when is_new && entry.len() >= 512: drop guard, warn!, return falsebefore the peer_contracts reverse-index and index_contract_hash writes.

Stated Intent: #3798 Gap 2 — bound an adversarial broadcast-amplification vector (an attacker registering interest under many distinct peer identities inflates UPDATE fan-out), with a cap "matching or lower than MAX_DOWNSTREAM_SUBSCRIBERS_PER_CONTRACT (hosting.rs), reject or evict on overflow."

Alignment: Matches precisely. interested_peers is the sole interest-side source feeding get_broadcast_targets_update (Source 2, update.rs:324), so capping it directly bounds fan-out. The "symmetric with hosting.rs" claim is verified true: hosting.rs:85 is = 512 and hosting.rs:738 enforces the identical is_new && entry.len() >= MAX reject-on-overflow pattern.

Gaps: None material in the production code. The or_default() at the entry guard cannot leak an empty-contract zombie on the reject path, because the reject only fires when the entry is already at 512 (i.e. already exists) — load-bearing reasoning that the doc-comment leaves implicit.


Testing Assessment

Coverage Level: adequate for the cap; one latent non-determinism in the test (below).

Test Type Status Notes
Unit ⚠️ test_register_peer_interest_caps_at_max covers the cap, all 3 invariants, and the at-capacity boundary — but rests on unguaranteed RNG distinctness (see Should-fix)
Integration N/A unit is the correct level for a single bounded function (slice 1)
Simulation N/A

Regression Test: present and independently verified to fail without the cap — two reviewers disabled the cap and the test panicked at the "new peer at capacity must be rejected" assertion, then passed when restored.

Invariants covered: (a) no zombie reverse-index on reject (get_contracts_for_peer(overflow).is_empty()), (b) is_new=false on reject, (c) renewals of existing peers never rejected and still refresh the summary. All three asserted.

Missing: downstream fan-out reduction (get_broadcast_targets_update) not asserted — acceptable for slice 1; the cap bounds the source map by construction, and the symmetric hosting.rs cap has only unit coverage too.


Skeptical Findings

Risk Level: low (production logic), with one test-robustness issue.

Concern Severity Location Details
Test rests on unguaranteed RNG distinctness — one real failure observed Should interest.rs:1199 make_peer_key (verified) make_peer_key(_seed: u8) ignores its seed and calls TransportKeypair::new() (random). The test fills the map with make_peer_key(0) ×512 but never asserts the 512 keys are pairwise distinct, nor that overflow_peer differs from all of them. One reviewer observed a cold-build run fail at the 513th call (it compared equal to a tracked peer → is_new=false → cap branch never reached). GlobalRng is thread-local; if the cargo test threadpool reuses a thread that left a deterministic seed, distinctness is no longer guaranteed (matches the known "GlobalRng seed state bleeds across concurrent tests in one binary" hazard). Production logic is deterministic and correct regardless — this is purely test robustness. Fix: derive the keypair bytes from the _seed/a counter so keys are deterministic-and-distinct, or assert overflow_peer is not already present before expecting rejection.
Atomicity / zombie entries — verified safe interest.rs:411-419 Reject does drop(entry); return false before all three writes; rejected peer leaves no entry in interested_peers, peer_contracts, or contract_hash_index. Confirmed at runtime by the test.
Off-by-one — verified safe reject condition >= 512 with insert-after caps at exactly 512 (511 passes → 512; 512 rejects).
No permanent lockout — verified safe sweep interest.rs:814 Slots free via INTEREST_TTL (20min) + 60s sweep. Time-bounded per AGENTS.md; an adversary must keep all 512 identities actively refreshing to hold the cap — exactly the cost the cap is meant to impose.

No high/medium production bugs.


Big Picture Assessment

Goal Alignment: yes — closes the issue's highest-priority gap (High severity / Low effort) with a minimal, isolated change reusing a proven sibling pattern.
Anti-Patterns Detected: none — no #[ignore], no removed tests, no weakened tolerances, no silent fallback (explicit warn! + return false).
Removed Code Concerns: none — purely additive (+95 / −0, one file).
Scope Assessment: exemplary. Single file, single logical change, no Gap 3/4/5/6 work smuggled in. Notably does NOT touch the tempting-adjacent update.rs Gap 4. Uses Refs #3798 (not Fixes) so the T-tracking issue stays open. Deferred gaps named in the PR body.

Two benign, undocumented boundary side-effects worth a comment (Consider):

  • Two-cap desync at subscribe.rs:616-638: hosting's add_downstream_subscriber (512) and the new interest cap (512) are independently-swept maps; at the exact boundary they can transiently differ (hosting accepts, interest rejects → the #4359 flush, the local-interest counter bump, and interest propagation are skipped while the peer IS in hosting's map). Benign (peer just won't get broadcasts until its next heartbeat), self-corrects in steady state.
  • Rejected peer still gets a Summaries response (node.rs:2442-2443): arguably correct (don't signal an adversary they were capped) but non-obvious.

Documentation

  • Code docs: the reject doc-comment is accurate and thorough. Could note the or_default()-can't-leak reasoning and the two-cap boundary relationship.
  • Module docs: interest.rs header doesn't mention the new 512 bound; a one-line addition would make the symmetric caps discoverable. (Consider.)
  • A one-line ## Fixes section stating "None — partial work on #3798; tracking issue stays open" would make the deliberate absence of Fixes read as intentional. (Consider.)

Recommendations

Must Fix (Blocking)

None. The production change is correct, atomic, time-bounded, and faithful to intent.

Should Fix (Important)

  1. Make the test's peer keys deterministic-and-distinct. make_peer_key ignores its seed and randomizes; the test never asserts pairwise distinctness and a real one-off failure was observed (RNG collision / leaked thread-local seed). Derive the 32 keypair bytes from the _seed/a counter (matching the sibling hosting.rs test's deterministic-index approach), OR assert overflow_peer is absent before expecting rejection. Prevents a latent flaky failure.

Consider (Suggestions)

  1. Rename/use the make_peer_key argument (it's ignored) — passing literal 0 in the fill loop reads like "same key 512×", the exact vacuous-test failure mode a reviewer fears.
  2. One-line comment at subscribe.rs:616-638 acknowledging the two symmetric 512 caps are independently swept and can momentarily desync at the boundary.
  3. Add a ## Fixes note + a module-doc line for the new cap.
  4. (Optional next boundary case) a test that frees a slot below cap and confirms a new peer is then accepted — proves the cap is a live bound, not a one-way latch. Sibling hosting.rs omits this too, so consistent to defer.

Verdict

State: Needs Changes — Light Re-Check Sufficient
HEAD SHA reviewed: d750cba

The production change is correct, minimal, well-scoped, and a model example of slicing a tracking issue — no blocking issues. The one Should-fix is test robustness: the regression test leans on RNG distinctness it doesn't assert, and a real one-off failure was observed; making the keys deterministic-and-distinct is a small, contained change. Because all findings are test-only and the production diff is unchanged, a diff-of-the-diff re-check on the test fix is sufficient.

[AI-assisted - Claude]

…3798)

The cap regression test `test_register_peer_interest_caps_at_max` relied
on RNG distinctness it never asserted. `make_peer_key` ignored its seed
and returned a random `TransportKeypair::new()`, so the 512-fill loop
depended on 512 random keypairs never colliding AND on no leaked
thread-local GlobalRng seed making them deterministic-equal. A reviewer
observed a real one-off cold-build failure: the 513th key collided with
a tracked peer, so registration returned is_new = false and the cap
branch was never exercised.

Make the test keys deterministic-AND-distinct. `make_peer_key` now
derives its bytes from the seed via `TransportPublicKey::from_bytes`
(matching the sibling hosting.rs helper), and a new `make_unique_peer_key`
takes a u32 seed so the fill loop can produce 512+ pairwise-distinct
peers regardless of RNG state. The overflow peer uses a seed past the
fill range so it is guaranteed untracked. This removes the flake
entirely while keeping the test a real regression guard (verified it
still fails when the cap is disabled).

Claude-Session: https://claude.ai/code/session_01CVyrLHuiXpFCC21by4Pzrj
@iduartgomez iduartgomez marked this pull request as draft June 27, 2026 14:27
@iduartgomez iduartgomez marked this pull request as ready for review June 30, 2026 08:33
@iduartgomez iduartgomez enabled auto-merge June 30, 2026 08:33
@iduartgomez iduartgomez added this pull request to the merge queue Jun 30, 2026
Merged via the queue into main with commit 87472f2 Jun 30, 2026
17 checks passed
@iduartgomez iduartgomez deleted the worktree-agent-a567f6891c2a52251 branch June 30, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant