Hi E2B team — thanks for the platform, we really enjoy building on it! We run long-lived stdio MCP servers inside sandboxes and lean heavily on on_timeout=pause / auto_resume, and we think we've hit a streaming-reconnect bug. We put together a tiny repro to make it easy to poke at. 🙏
TL;DR: after reattaching to a backgrounded process with commands.connect(pid) on a resumed sandbox, on_stdout stops firing for output that follows a large (~8KB) response — even though the process keeps writing and send_stdin keeps working. A stream that only emits small responses is totally fine; a single ~8KB response is enough to wedge it.
Environment
e2b Python SDK 2.20.2 (also present on 2.23.1)
- Python 3.12, public
base template, region eu-west-1
- Sandboxes created with
lifecycle={"on_timeout": "pause", "auto_resume": True}
- Background command via
commands.run(cmd, background=True, stdin=True, timeout=0), reattached with commands.connect(pid, on_stdout=…, timeout=0)
Repro (SDK-only, public base template)
import asyncio
from e2b import AsyncSandbox
# Reads a line; "BIG" -> ~8KB response, anything else -> echo it back.
PROC = ("python3 -u -c \"import sys\n"
"for line in sys.stdin:\n"
" sys.stdout.write(('X'*8000 if line.startswith('BIG') else line.strip())+'\\n')\n"
" sys.stdout.flush()\"")
async def main():
sbx = await AsyncSandbox.create(
template="base", timeout=20,
lifecycle={"on_timeout": "pause", "auto_resume": True},
)
got: list[str] = []
proc = await sbx.commands.run(PROC, background=True, stdin=True, timeout=0,
on_stdout=lambda s: got.append(s))
pid = proc.pid
handle = proc
for c in range(1, 13):
await handle.disconnect()
await sbx.pause() # force a pause
await asyncio.sleep(1)
handle = await sbx.commands.connect(pid, timeout=0,
on_stdout=lambda s: got.append(s))
before = len(got)
await sbx.commands.send_stdin(pid, "BIG\n") # large response
for i in range(3):
await sbx.commands.send_stdin(pid, f"s{c}-{i}\n") # small responses
await asyncio.sleep(3)
delivered = len(got) - before
print(f"cycle {c}: sent 4 (1 big + 3 small), got {delivered}"
f"{' <-- STALL' if delivered < 4 else ''}")
asyncio.run(main())
Observed (SDK 2.20.2, base)
cycle 1: sent 4 (1 big + 3 small), got 4
cycle 2: sent 4 (1 big + 3 small), got 4
cycle 3: sent 4 (1 big + 3 small), got 4
cycle 4: sent 4 (1 big + 3 small), got 1 <-- STALL (only the big came back)
cycle 5: sent 4 (1 big + 3 small), got 0 <-- STALL (stream stays wedged)
...
DONE: 9/12 stalled
The all-small control (replace "BIG\n" with another small line) gives 0/12 stalled, which is what points at the large response as the trigger.
What we'd expect
After commands.connect to a running pid on a resumed sandbox, on_stdout should keep delivering all subsequent process output for the life of the connection, regardless of an earlier large response.
A couple of related things we noticed
- Sometimes
commands.connect itself hangs ~60s and then fails with TimeoutException: The sandbox is running but port is not open; request_timeout doesn't seem to bound it (feels related to #1128).
- A fresh
commands.run (new process) on the same resumed sandbox always streams fine — so it looks specific to reconnecting to the existing pid.
Our workaround (for context)
We cap the connect establish with an external asyncio.wait_for, and when a reattached session goes quiet we drop it and commands.run a fresh process. It works, but we lose the warm process on each flaky reattach.
Possibly related: #1128, #857, #1031. Happy to share raw event traces or test a patch — thanks so much for taking a look!
Hi E2B team — thanks for the platform, we really enjoy building on it! We run long-lived stdio MCP servers inside sandboxes and lean heavily on
on_timeout=pause/auto_resume, and we think we've hit a streaming-reconnect bug. We put together a tiny repro to make it easy to poke at. 🙏TL;DR: after reattaching to a backgrounded process with
commands.connect(pid)on a resumed sandbox,on_stdoutstops firing for output that follows a large (~8KB) response — even though the process keeps writing andsend_stdinkeeps working. A stream that only emits small responses is totally fine; a single ~8KB response is enough to wedge it.Environment
e2bPython SDK 2.20.2 (also present on 2.23.1)basetemplate, regioneu-west-1lifecycle={"on_timeout": "pause", "auto_resume": True}commands.run(cmd, background=True, stdin=True, timeout=0), reattached withcommands.connect(pid, on_stdout=…, timeout=0)Repro (SDK-only, public
basetemplate)Observed (SDK 2.20.2,
base)The all-small control (replace
"BIG\n"with another small line) gives0/12 stalled, which is what points at the large response as the trigger.What we'd expect
After
commands.connectto a running pid on a resumed sandbox,on_stdoutshould keep delivering all subsequent process output for the life of the connection, regardless of an earlier large response.A couple of related things we noticed
commands.connectitself hangs ~60s and then fails withTimeoutException: The sandbox is running but port is not open;request_timeoutdoesn't seem to bound it (feels related to #1128).commands.run(new process) on the same resumed sandbox always streams fine — so it looks specific to reconnecting to the existing pid.Our workaround (for context)
We cap the connect establish with an external
asyncio.wait_for, and when a reattached session goes quiet we drop it andcommands.runa fresh process. It works, but we lose the warm process on each flaky reattach.Possibly related: #1128, #857, #1031. Happy to share raw event traces or test a patch — thanks so much for taking a look!