Skip to content

fix: deflake delegate node-limit tests under parallel cargo test --lib#4564

Open
mvanhorn wants to merge 1 commit into
freenet:mainfrom
mvanhorn:fix/4418-deflake-delegate-node-limit-test
Open

fix: deflake delegate node-limit tests under parallel cargo test --lib#4564
mvanhorn wants to merge 1 commit into
freenet:mainfrom
mvanhorn:fix/4418-deflake-delegate-node-limit-test

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

What

Deflakes test_create_delegate_non_attested_still_counts_toward_node_limit (and its sibling delegate-creation tests) under a threaded cargo test -p freenet --lib run.

Why

test_create_delegate_non_attested_still_counts_toward_node_limit snapshots the process-global CREATED_DELEGATES_COUNT atomic into before, creates a delegate, asserts after > before, then restores the counter with fetch_sub(1). Because the counter is a process-global static, other delegate-creation tests in the same binary run concurrently on other threads and call CREATED_DELEGATES_COUNT.fetch_add(1) inside create_delegate_sync, perturbing the before/after snapshot and racing the cleanup. The test passes in isolation but fails intermittently under the full parallel --lib run.

CI uses nextest per-process isolation, so each test gets its own process and never observes the race; only a threaded cargo test --lib exercises it.

How

Group the delegate-creation tests that read or mutate the shared CREATED_DELEGATES_COUNT global into a single named serial group via #[serial_test::serial(created_delegates_count)], so they cannot run concurrently with one another. The named group serializes only these tests against each other; it does not globally serialize the rest of the suite, keeping the wall-clock impact minimal. serial_test is already a workspace dev-dependency.

No production code changes — the fix is confined to the test module in crates/core/src/wasm_runtime/delegate_api.rs.

Verification

  • cargo build -p freenet --lib — passes
  • cargo test -p freenet --lib wasm_runtime::delegate_api — 43 passed, 0 failed
  • cargo fmt -p freenet --check — clean

Fixes #4418

test_create_delegate_non_attested_still_counts_toward_node_limit
snapshots the process-global CREATED_DELEGATES_COUNT atomic into `before`,
creates a delegate, and asserts `after > before` before restoring the
counter with fetch_sub(1). Under a threaded `cargo test -p freenet --lib`
run, sibling delegate-creation tests on other threads call
create_delegate_sync concurrently and fetch_add the same global, perturbing
the before/after snapshot and racing the cleanup. CI uses nextest
per-process isolation so it never sees the race.

Serialize the delegate-creation tests that read or mutate the shared
CREATED_DELEGATES_COUNT global into a single named serial group
(`#[serial_test::serial(created_delegates_count)]`) so they cannot run
concurrently with one another. The named group avoids globally serializing
the whole suite. serial_test is already a workspace dev-dependency. No
production code changes; the fix is confined to the test module.

Fixes freenet#4418
@github-actions

Copy link
Copy Markdown
Contributor

Now I have all the context needed. Let me compile the review.


Rule Review: No significant issues found

Rules checked: testing.md, code-style.md, git-workflow.md, contracts.md
Files reviewed: 1 (crates/core/src/wasm_runtime/delegate_api.rs)

Summary

The PR adds #[serial_test::serial(created_delegates_count)] to 8 async tests that all touch the CREATED_DELEGATES_COUNT global atomic (static AtomicUsize). When cargo test --lib runs tests in parallel (multiple threads, one process), these tests race on the shared counter and fail non-deterministically. Serializing them on a named token is the correct, established fix for this codebase — it is the exact same pattern already used in crates/core/src/server/client_api/permission_prompts.rs with #[serial_test::serial(sse_global_counter)].

Root-cause assessment: The fix is appropriate. serial_test::serial is not #[ignore], not a retry, not a timeout increase — it directly addresses the shared-state race that causes the flakiness.

Counter accumulation: MAX_CREATED_DELEGATES_PER_NODE = 1024. The 8 serialized tests create at most ~3–4 delegates (net, after the one cleanup in test_create_delegate_non_attested_still_counts_toward_node_limit), so the serialized order will never approach the limit. Tests that assert on the counter (creations_this_call) use a per-env Cell, not the global atomic, so those assertions are not affected by accumulated state.


Warnings

None.


Info

  • crates/core/src/wasm_runtime/delegate_api.rs:1014–1183 — The PR is labeled fix:, and testing.md states "CI will reject fix: PRs that don't add at least one new #[test] function." No new test function is added here. However, this is a test-infrastructure fix (parallelism isolation), not a production bug, and the nature of the bug — test interference under parallel execution — is not catchable by a standalone #[test] function. The #[serial_test::serial] annotation is itself the regression-prevention mechanism. This is the established codebase pattern for this class of problem (rule: testing.md § "fix: PRs MUST include regression tests").

Rule review against .claude/rules/. WARNING findings block merge.

@mvanhorn

Copy link
Copy Markdown
Contributor Author

The Rule Lint is failing because this fix: PR doesn't add a new #[test]. In this case it genuinely can't: the entire change adds #[serial_test::serial(created_delegates_count)] to 8 existing tests that share the global CREATED_DELEGATES_COUNT counter and were flaking under parallel cargo test --lib. There's no new production code path to cover, and a contrived extra test wouldn't exercise anything meaningful. Per the lint's guidance, could a maintainer apply the test-exempt label? I don't have label permissions. Happy to adjust if you'd prefer a different approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky: test_create_delegate_non_attested_still_counts_toward_node_limit fails under parallel cargo test --lib

1 participant