fix(swarm): avoid re-extracting completed outbound substreams by caniko · Pull Request #6427 · libp2p/rust-libp2p

caniko · 2026-05-07T06:48:55Z

What

Connection::poll can panic with cannot extract twice (swarm/src/connection.rs:708 on master) when an outbound SubstreamRequested is extracted before its Future::poll impl has stored a waker — typically when the muxer reports poll_outbound = Ready on the same iteration the request was pushed. Without a stored waker, extract() cannot schedule the cleanup poll that would remove the resulting Done entry, and the next outbound-ready iteration's iter_mut().next() re-extracts the stale Done and panics.

Fix

Select a Waiting entry explicitly at the extraction site:

requested_substreams
    .iter_mut()
    .find(|r| matches!(r, SubstreamRequested::Waiting { .. }))

This makes the existing intent ("pick a waiting request") explicit and removes the dependence on FuturesUnordered's internal poll-ordering. Done entries are still cleaned up by the existing wake-driven path whenever the connection is polled again.

Test

Added a unit test that reproduces the failure directly on a FuturesUnordered<SubstreamRequested<…>> (push, extract() before any poll, then re-iterate). Without the fix it panics; with the fix the filter skips the Done entry.

Also verified against the downstream tournament mesh workload that originally surfaced the panic.

`Connection::poll` selects the next outbound-ready `SubstreamRequested` via `iter_mut().next()` and calls `.extract()` on it. `extract()` moves the data out and replaces the variant with `Done`, then relies on `extracted_waker.wake()` to schedule the cleanup poll that removes the entry from `requested_substreams`. The waker is only populated by the `Future::poll` impl when the entry first observes a `Poll::Pending`. If the entry is extracted before that first poll ever happens — which can occur when `FuturesUnordered`'s ordering puts a freshly-pushed entry at the head of `iter_mut()`, or when the muxer returns `poll_outbound = Ready` on the same loop iteration as the request was pushed — the waker is `None`, no wake is scheduled, and the `Done` entry persists. The next outbound-ready iteration's `iter_mut().next()` lands on that stale `Done` entry and `extract()` panics. The intent at this site has always been "find a request waiting for a substream"; making that explicit with `iter_mut().find(|r| matches!(r, SubstreamRequested::Waiting { .. }))` removes the dependence on `FuturesUnordered`'s scheduling. Eager removal of the `Done` entry is not possible through `FuturesUnordered`'s public API while iterating it, so the existing wake-driven cleanup stays in place. The number of un-cleaned `Done` entries is bounded by outstanding outbound requests and is drained naturally as later events trigger connection polls. Adds a unit test reproducing the failure shape directly on a `FuturesUnordered<SubstreamRequested<()>, DeniedUpgrade>>`.

elenaf9 · 2026-05-12T07:10:08Z

Thank you for the PR @caniko. We appreciate all contributions, including ones that were created with the help of LLMs/ AI-tools.

That said, the sheer amount of text in this PR (descriptions, code comments) makes it very time-consuming to review a change that appears to be just a one-liner.
Could you please:

summarize the key change and its motivation in a few lines on a high level
reduce code docs to a reasonable level

caniko · 2026-05-12T11:24:55Z

Apologies @elenaf9, I usually let these sit and wait as draft, and improve the PR before asking for review. This PR seems to have skipped this step, anyway, it is in the past.

I tried reducing the text, and hope it is good enough for review

github-actions · 2026-05-13T00:33:14Z

It seems this issue might have been automatically generated. To help us address it effectively, please provide additional details.

We value the use of LLMs for code generation and welcome your contributions but please ensure your submission is of such quality that a maintainer will spend less time reviewing it than implementing it themselves. Verify the code functions correctly and meets our standards. If your change requires tests, kindly include them and ensure they pass.

If no further information is provided, the issue will be automatically closed in 7 days. Thank you for your understanding and for aiding us in maintaining quality contributions!

github-actions · 2026-05-14T00:34:25Z

It seems this issue might have been automatically generated. To help us address it effectively, please provide additional details.

We value the use of LLMs for code generation and welcome your contributions but please ensure your submission is of such quality that a maintainer will spend less time reviewing it than implementing it themselves. Verify the code functions correctly and meets our standards. If your change requires tests, kindly include them and ensure they pass.

If no further information is provided, the issue will be automatically closed in 7 days. Thank you for your understanding and for aiding us in maintaining quality contributions!

caniko · 2026-05-14T13:00:16Z

@elenaf9 I think there is something wrong with the bot

jxs

Hi, and thanks for this! Left some comments

jxs · 2026-05-21T14:16:34Z

+            if let Some(requested_substream) = requested_substreams
+                .iter_mut()
+                .find(|r| matches!(r, SubstreamRequested::Waiting { .. }))
+            {


we could just make extract return an Option .
We could then

if let Some((user_data, timeout, upgrade)) = requested_substreams.iter_mut().next().and_then(|r| r.extract()) {

finding here for a SubstreamRequested::Waiting to then match again inside extract feels redundant.

Done. extract() now returns Option, and the connection path uses find_map(SubstreamRequested::extract) to skip stale Done entries instead of panicking. I kept a separate waiting-request precheck before polling the muxer so we do not open an outbound stream unless there is a request to pair with it.

jxs · 2026-05-21T14:18:24Z

+    // Regression test for `panic!("cannot extract twice")`: pushing two
+    // entries and calling `extract()` without ever polling them leaves
+    // a `Done` entry that `iter_mut().next()` would re-extract. The
+    // `find(Waiting)` filter must skip past it.
+    #[test]
+    fn iter_mut_skips_done_substream_requested_entries() {


this test manually creates the race condition scenario by directly manipulating the internal state, it doesn't actually prove the panic can happen in real-world usage.
Do you have an actual Minimal, Reproducible Example?

Done. I replaced the direct FuturesUnordered<SubstreamRequested<_>> state test with a Connection::poll-level test using a mock handler that queues outbound requests and a muxer that returns outbound streams immediately. This now covers the real connection polling path instead of calling extract() directly. One caveat: when I back-applied this two-request test to the current base, it did not reproduce the panic deterministically, so I am treating it as path coverage for the stale-Done handling rather than a standalone deterministic MRE for the original downstream contention failure.

caniko · 2026-05-25T04:58:56Z

Updated per review: extract() now returns Option, the connection path uses that to skip stale Done entries, and the regression coverage now goes through Connection::poll instead of directly manipulating SubstreamRequested. I also renamed the PR title to satisfy the semantic-title check.

Validation run locally with an ad hoc Nix Rust shell because the downstream project flake is currently blocked by a stale rs-detritus input:

cargo test -p libp2p-swarm connection_poll_skips_done_substream_requested_entries
cargo test -p libp2p-swarm

jxs · 2026-05-25T13:18:03Z

+            if requested_substreams
+                .iter_mut()
+                .any(|r| matches!(r, SubstreamRequested::Waiting { .. }))
+            {


if extract now return an option this becomes unnecessary right? We can leave the previous code

Good call, I dropped the extra precheck and kept the outbound polling path in the previous shape, with extract() handling stale entries via Option.

caniko changed the title ~~swarm: fix panic!("cannot extract twice") in Connection::poll~~ swarm: fix panic!("cannot extract twice") in Connection::poll May 7, 2026

elenaf9 added the kind/generated This issue might have been automatically generated label May 12, 2026

swarm: trim docs and changelog for cannot-extract-twice fix

4b11ee2

github-actions Bot added the need/author-input Needs input from the original author label May 13, 2026

elenaf9 removed the need/author-input Needs input from the original author label May 13, 2026

github-actions Bot added the need/author-input Needs input from the original author label May 14, 2026

jxs reviewed May 21, 2026

View reviewed changes

swarm: address cannot-extract-twice review

e563501

caniko changed the title ~~swarm: fix panic!("cannot extract twice") in Connection::poll~~ fix(swarm): avoid re-extracting completed outbound substreams May 25, 2026

jxs reviewed May 25, 2026

View reviewed changes

swarm: restore outbound request polling shape

f43eff8

Conversation

caniko commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Fix

Test

Uh oh!

elenaf9 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caniko commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

caniko commented May 14, 2026

Uh oh!

jxs left a comment

Choose a reason for hiding this comment

Uh oh!

jxs May 21, 2026

Choose a reason for hiding this comment

Uh oh!

caniko May 25, 2026

Choose a reason for hiding this comment

Uh oh!

jxs May 21, 2026

Choose a reason for hiding this comment

Uh oh!

caniko May 25, 2026

Choose a reason for hiding this comment

Uh oh!

caniko commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jxs May 25, 2026

Choose a reason for hiding this comment

Uh oh!

caniko May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

caniko commented May 7, 2026 •

edited

Loading

elenaf9 commented May 12, 2026 •

edited

Loading

caniko commented May 25, 2026 •

edited

Loading