Skip to content

[Bug]: Python SDK HTTP 2 transport fails under high async sandbox concurrency with low default max_keepalive_connections #1418

@rohitrastogi

Description

@rohitrastogi

Sandbox ID or Build ID

No response

Environment

e2b==2.21.1
httpx==0.28.1
httpcore==1.0.9
h2==4.3.0
Python 3.13.6
macOS arm64

Timestamp of the issue

2026-06-10 15:15 America/Los_Angeles

Frequency

Happens every time

Expected behavior

The async Python SDK should support many concurrent sandboxes in one event loop without any client-side HTTP2 protocol errors.

Specifically, concurrent per-sandbox command/filesystem operations should NOT fail with h2 state-machine error such as Invalid input ConnectionInputs.SEND_SETTINGS in state ConnectionState.CLOSED.

A locally closed HTTP2 connection should probably either not be selected for a request, connections should not be closed while assigned to a request, or the SDK/client should safely retry before surfacing client errors.

Actual behavior

At high sandbox concurrency, the async Python SDK can fail during sandbox commands with HTTP2 closed-state errors. I've encountered the following:

  • Invalid input ConnectionInputs.SEND_SETTINGS in state ConnectionState.CLOSED
  • Invalid input ConnectionInputs.SEND_HEADERS in state ConnectionState.CLOSED

when using operations like sandbox.commands.run(...) and sandbox.files.make_dir(...).

In the OpenAI Agents SDK E2B Sandbox integration, this same underlying issue surfaces as higher-level failures such as:

  • WorkspaceStartError
  • WorkspaceArchiveWriteError
  • ExecTransportError

This appears to be specific to the Python client path. I wrote a similar script testing the JavaScript SDK and was not able to reproduce the issue, even at higher concurrency.

The Python SDK creates one shared async httpx.AsyncHTTPTransport per event loop with http2=True, max_connections=2000, and max_keepalive_connections=20. Under high sandbox concurrency, the shared pool contains many per-sandbox envd origins. Raising only E2B_MAX_KEEPALIVE_CONNECTIONS eliminated the closed-state HTTP2 issues.

Issue reproduction

  1. Spawn many concurrent async sandboxes in one Python event loop (try 64 sandboxes or higher)
  2. For each sandbox, run this per-sandbox sequence with default E2B_MAX_KEEPALIVE_CONNECTIONS:
sandbox = await AsyncSandbox.create(template=template)

await sandbox.commands.run(
    "mkdir -p /filesystem && test -d /filesystem && echo ok",
    timeout=120,
    request_timeout=120,
)

await sandbox.files.make_dir(
    "/filesystem/e2b-transport-debug",
    request_timeout=120,
)

await sandbox.commands.run(
    "stat /filesystem/e2b-transport-debug",
    timeout=120,
    request_timeout=120,
)

await sandbox.kill()

Locally with default keepalive setting, I see:

c=64, default keepalive=20:
  7 closed-state HTTP/2 errors
  1. Re-run same workload with E2B_MAX_KEEPALIVE_CONNECTIONS=2000

With the env variable set, I see:

c=64:
  0 errors

c=250:
  0 closed-state HTTP/2 errors
  1 unrelated DNS error

Additional context

Current async SDK defaults are:

limits = Limits(
    max_keepalive_connections=int(os.getenv("E2B_MAX_KEEPALIVE_CONNECTIONS", "20")),
    max_connections=int(os.getenv("E2B_MAX_CONNECTIONS", "2000")),
    keepalive_expiry=int(os.getenv("E2B_KEEPALIVE_EXPIRY", "300")),
)

with per async transport cached per event loop with:

transport = AsyncTransportWithLogger(
    limits=limits,
    proxy=config.proxy,
    http2=True,
)

Based on tracing httpcore, It looks like one request finishing can close the connection that a different in-flight request was just assigned to but hadn't started using yet. After a response closes,httpcore a triggers global connection pool cleanup. Because the max_keepalive_connections are low relative to number of sandbox origins, cleanup is quite aggressive.

Monkeypatching the Python client to use HTTP 1.1 also avoided the issue along with raising E2B_MAX_KEEPALIVE_CONNECTIONS. Perhaps the async transport can be made more configurable and/or use a higher keepalive default for high-concurrency workloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions