fetch: encode FetchTasklet's cross-thread ownership in the type system#31745
fetch: encode FetchTasklet's cross-thread ownership in the type system#31745alii wants to merge 20 commits into
Conversation
…drop stored result body alias
Group the JS-thread-only fields into a JsState struct and the four mutex-guarded handoff fields (result, metadata, scheduled_response_buffer, body_size) into HttpHandoff, so each field's thread affinity is visible at the access site (self.js.* vs self.shared.*). Move the eight JS-side helpers onto JsState and name the callback's merge/staging steps as HttpHandoff::merge_result / stage_response_bytes. The mutex and every lock/unlock site are untouched; all FetchTasklet fields are now private. No behavior change.
…t borrows Replace the free-floating mutex + manually paired lock/unlock calls with bun_threading::Guarded<HttpHandoff>, so the guarded fields can only be reached through a lock guard and every early return unlocks via RAII at the exact same points as before. - Lock-held helpers (start_request_stream, on_body_received, check_server_identity, on_reject, to_body_value, to_response, on_resolve) now take (&mut Parts, &mut HttpHandoff) instead of &mut self, where Parts is a disjoint-field borrow bundle split off before taking the lock; bodies are unchanged respellings. - on_progress_update's cleanup closure consumes the guard by value so each of its call sites unconditionally unlocks first, exactly where the manual unlock was. - callback() and the start-streaming drain use explicit drop(guard) at each exit, at the same program points as the old unlocks. - clear_data/release_at_shutdown use Guarded::get_mut (no new lock) for their existing exclusive-access teardown writes. - jsc::Weak gains create_ptr (raw-ctx variant of create) for on_resolve, which can no longer form &mut FetchTasklet while field borrows live. Lock count, critical-section contents, and allocation placement are unchanged.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughSplits FetchTasklet into JS-thread ChangesFetchTasklet Architectural Refactoring
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/runtime/webcore/fetch/FetchTasklet.rs (1)
856-899:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBalance the streaming-upload ref during shutdown.
release_at_shutdown()only accounts for the two baseline refs, but the extra sink ref taken on Line 917 is only released fromwrite_end_request()on Lines 2349, 2361, and 2376. Ifprocess.exit()interrupts a fetch while the request-body stream is still open,write_end_request()never runs, the refcount never reaches zero, andclear_sink()/clear_data()are skipped for that tasklet.Please make the shutdown path balance an outstanding sink ref too, or guarantee that the sink ref cannot still exist once
release_at_shutdown()is reachable.Also applies to: 901-922, 2343-2377
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e33071d9-ce72-415f-b06a-0ebc52c3a850
📒 Files selected for processing (4)
src/jsc/Weak.rssrc/runtime/dispatch.rssrc/runtime/webcore/fetch/FetchTasklet.rssrc/threading/guarded.rs
|
@robobun adopt |
|
🤖 Adopted this PR. CI status: @alii merged main (cbd7951); e6abfab adds claude[bot]'s two follow-ups (defensive streaming-refs take in the fourth exit path + The shutdown-leak fix: Tests ( Earlier adoption work: merge conflict with #31746 resolved; refactor spot-checks; full fetch/body suite validation against a merge-base debug build; all bot review threads addressed and resolved. |
There was a problem hiding this comment.
Automated review found no issues, but I'm deferring to a human reviewer given this restructures the cross-thread refcount and locking protocol for FetchTasklet — the kind of change where a second pair of eyes on the lock windows and ref-release sites is worth having.
Extended reasoning...
Overview
This PR refactors FetchTasklet (~700 lines of diff in that file alone) to encode its cross-thread ownership in the type system: the free mutex: Mutex becomes shared: Guarded<HttpHandoff>, fields are partitioned into JsState (JS-thread-only) and HttpHandoff (lock-protected), the refcount now starts at init_exact_refs(2) instead of init() + a later ref_() in queue(), and each release site is given a named wrapper (release_js_ref, release_http_ref, etc.). Supporting changes: ignore_data and a new is_http2 mirror become atomics, GuardedLock is made !Send/!Sync, and jsc::Weak gains an unsafe fn create_ptr for the split-borrow caller. dispatch.rs is a one-line rename to use the new release_js_ref.
Security risks
None identified. This is internal memory-ownership and concurrency plumbing for the fetch implementation; no auth, crypto, input parsing, or trust-boundary surfaces are touched. The new unsafe fn Weak::create_ptr widens the API surface slightly but carries a documented safety contract and the safe create now delegates to it.
Level of scrutiny
High. FetchTasklet is the cross-thread coordination point for every fetch() call — it juggles an intrusive thread-safe refcount, a JS↔HTTP handoff mutex, lifetime-erased self-referential borrows, and JSC GC-finalizer re-entry. The PR is described as behavior-preserving and the bulk is mechanical field re-homing, but several changes are semantically load-bearing rather than pure renames: moving the second baseline ref from queue() to init_exact_refs(2) (with the PR noting this relies on get() staying infallible after the Box::new), converting ignore_data to a relaxed atomic, mirroring is_http2 into a separate atomic for the lock-free skip_chunked_framing reader, and threading the lock guard by value through the on_progress_update cleanup closure. Each looks correct on inspection and the PR description argues each carefully, but these are exactly the spots where a subtle ordering or aliasing mistake would manifest as a rare UAF or deadlock under load.
Other factors
The PR is exceptionally well-documented (full ref table, lock-invariant doc, per-commit staging) and the existing fetch test suite passes with the same two pre-existing failures as main. The bug-hunting system found nothing. Still, given the size, the concentration of unsafe, raw pointers, and cross-thread refcounting, and that this is production-critical hot-path code, a human reviewer familiar with the FetchTasklet/HTTP-thread lifecycle should sign off — auto-approval is not appropriate here.
|
@robobun fix conflict |
…own) Three child-process fixtures pin the HTTP-thread shutdown protocol that dealloc_in_flight_for_exit / FetchTasklet::release_at_shutdown implement: exit mid-download (metadata accepted, body pending), exit before any response bytes (nothing queued), and exit mid streaming upload (ResumableSink attached). A ref-protocol mistake on these paths surfaces as a wrong exit code (crash/ASAN abort) or a hung exit; none of these states had coverage before.
process.exit() during a streaming request-body upload leaked the whole in-flight chain (FetchTasklet, Box<AsyncHTTP>, ResumableSink, ThreadSafeStreamBuffer — LSan-visible with BUN_DESTRUCT_VM_ON_EXIT=1): the sink ref and any queued drain-task ref are normally dropped on the JS thread, which is parked at exit, and the queued drain nodes are ManagedTask-tagged so release_queued_tasks_for_shutdown cannot release them. Track both (sink_ref_held, queued_drain_tasks) and balance them in release_at_shutdown; release the HTTP-side ThreadSafeStreamBuffer ref in dealloc_in_flight_for_exit the way InternalState::reset does. The fetch-exit-in-flight tests now run their fixtures with BUN_DESTRUCT_VM_ON_EXIT=1 and LeakSanitizer enabled (inert on non-ASAN builds), so an unbalanced ref on any of the three exit paths fails as exit 23 with the leaked chain in the report.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/web/fetch/fetch-exit-in-flight.test.ts`:
- Around line 43-48: Remove the stale ASAN stderr filtering: delete the
filteredStderr construction (the .split/.filter/.join block that removes lines
starting with "WARNING: ASAN interferes") and update any uses of filteredStderr
to use stderr directly (so assertions like expect(filteredStderr).toBe("")
become expect(stderr).toBe("")). This ensures the test relies on bunEnv-provided
stable stderr and will surface real diagnostics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9606bbf5-10b4-4b2c-a6e4-177431f5e148
📒 Files selected for processing (3)
src/http/HTTPThread.rssrc/runtime/webcore/fetch/FetchTasklet.rstest/js/web/fetch/fetch-exit-in-flight.test.ts
Bun subprocesses no longer emit the ASAN signal-handler startup warning, so assert on raw stderr; any stderr output now fails the test loudly.
A streaming upload's final callback can land between is_shutting_down flipping true and SHUTDOWN_REQUESTED being observed (global_exit runs worker termination in between); it then leaves in_flight, so release_at_shutdown never runs for it and the sink/drain refs kept the chain above zero. Mirror release_at_shutdown's sink_ref_held / queued_drain_tasks balancing before the two baseline releases. The JS thread is not parked yet in this window but never ticks the event loop again, so the JS-side release sites cannot race the loads.
A final callback that loses the has_schedule_callback CAS in the exit window returns before the shutdown branch and leaves in_flight, so neither release_at_shutdown nor callback's shutdown branch balanced the sink/drain refs for it — only the queued progress node survives, into release_queued_tasks_for_shutdown. Centralize the balancing in take_streaming_refs_for_exit (atomic swap take of sink_ref_held and queued_drain_tasks) and call it from all three exit paths; whichever reaches the tasklet first claims the refs, the rest see zero, so paths that can both run for one tasklet cannot double-release.
The failing mode includes the child's LeakSanitizer symbolizing a large leak report under the ASAN debug build, which exceeds the default per-test timeout on loaded CI; name the constant and justify the exception to the no-timeouts convention.
… too The fourth is_shutting_down exit path (a final progress node dispatched after the flag flips) released only the JS baseline ref; the entry had already left in_flight and a dequeued node never reaches the shutdown queue drain, so the sink/drain refs would strand there. The JS loop does not tick in that window today, so this is defensive symmetry — the take is an idempotent no-op when another path already claimed the refs. Also correct the GuardedLock comment: std::sync::MutexGuard is Sync where T: Sync; the raw-pointer PhantomData here is stricter (!Send and !Sync), which nothing depends on either way.
There was a problem hiding this comment.
All prior review feedback has been addressed and no new issues found, but given the ~1300-line restructuring of FetchTasklet's cross-thread locking/refcount discipline plus the new unsafe Weak::create_ptr and GuardedLock auto-trait change, this should get a human pass before merging.
Extended reasoning...
Overview
This PR restructures FetchTasklet (src/runtime/webcore/fetch/FetchTasklet.rs, +829/−498) to encode its cross-thread ownership in the type system: the free Mutex becomes Guarded<HttpHandoff>, JS-thread state is split into JsState, the refcount protocol is named (release_js_ref/release_http_ref/release_sink_ref/release_drain_task_ref, init_exact_refs(2)), and the HTTP→JS buffer handoff is factored into HttpHandoff::{merge_result, stage_response_bytes}. Supporting changes: GuardedLock gains a PhantomData<*const ()> to become !Send/!Sync, jsc::Weak gains an unsafe fn create_ptr for split-borrow callers, HTTPThread::dealloc_in_flight_for_exit now deinits the streaming request body, and dispatch.rs's shutdown FetchTasklet arm now drains streaming refs via the new take_streaming_refs_for_exit. One deliberate behavior change (a streaming-upload exit-time leak fix) plus three new LSan-backed exit fixtures.
Security risks
No direct attack surface change — this is internal lifetime/ownership plumbing for fetch(). The risk class is memory safety (UAF/double-free across the JS↔HTTP thread boundary) rather than injection/auth. The new unsafe fn Weak::create_ptr shifts a liveness proof from the type system to caller documentation; the !Send marker on GuardedLock is a soundness improvement. No user-controlled data handling changed.
Level of scrutiny
High. This is concurrency-critical, production-hot-path code (every fetch() flows through it) with hand-managed atomic refcounts, lock-window invariants that the PR claims are preserved "by construction", and four exit-time release paths whose double-release safety depends on an idempotent atomic swap. The PR description is excellent and the commits are staged for review, but verifying that every guard drop lands where the old manual unlock() did, and that the init_exact_refs(2) change is balanced on every path, requires a maintainer who knows the prior FetchTasklet locking discipline. This is well outside the "simple/mechanical" bar for auto-approval.
Other factors
All five of my earlier inline findings (the three exit-path ref-balance gaps, the MutexGuard Sync doc inaccuracy, and the test-timeout convention) were addressed in 0d2ddc2 / 2bb34e8 / e6abfab / a09b295 and the threads are resolved. The bug-hunting system found nothing on this revision. CI's only failure (bunx.test.ts) is an unrelated upstream registry issue per robobun, and the new fetch-exit-in-flight fixtures are green on every lane including ASAN. The remaining reason to defer is purely scope and subject matter, not any open concern.
The take protocol assumed the JS-side release sites cannot run once is_shutting_down flips, but the JS thread is still doing global_exit cleanup in that window: server socket teardown fires user abort listeners (RequestContext::on_abort has no shutdown gate) which can reach sink.cancel -> write_end_request -> release_sink_ref synchronously, and on_abort's microtask drain can resume a parked sink continuation to the same effect. If a final callback's exit-branch take won that race, both threads released the same sink ref and the JS-side plain deref freed the tasklet inline while the HTTP side still used it. Make the releases claim-checked: release_sink_ref derefs only if it swapped the marker from true, release_drain_task_ref only if the counter was nonzero (also removes the u32 underflow if a queued drain ever raced a take). Safe in the stolen case because the taker's deref_from_thread parks the box during shutdown instead of freeing it, so the marker RMW is always on live memory. Correct the SAFETY comments that stated the race could not happen.
WTF::ParkingLot lazily allocates a never-freed per-thread ThreadData the first time a thread parks, which LeakSanitizer reports when process.exit() interrupts an HTTP thread mid-park. This only shows on native macOS ASAN builds (darwin CI cross-compiles without ASAN), where it made all three fetch-exit-in-flight fixtures fail on a default Apple Silicon debug build. Verified the suppression masks nothing: the real FetchTasklet/ThreadSafeStreamBuffer/ResumableSink leak chain is still reported on pre-fix code with it in place.
What does this PR do?
Follow-up to the review discussion on #31676, where the concern was that "the memory lifetime of all the fetch stuff is very confusing right now." This is a behavior-preserving refactor of
FetchTaskletthat moves the cross-thread ownership rules out of comments and into the type system, so changes like #31676's buffer handoff can be reviewed by reading one small struct instead of reconstructing the locking discipline from comments scattered across 2,400 lines.1. The mutex now owns the data it guards. The free-floating
mutex: Mutexwhose protected set was documented only in comments ("Mutex guardsbufferagainst the HTTP thread") becomesshared: Guarded<HttpHandoff>, reusing the existingbun_threading::Guarded. The cross-thread state —result,metadata,scheduled_response_buffer,body_size— is unreachable without holding the lock. Lock windows are unchanged: every guard drop sits exactly where a manualunlock()used to (theon_progress_updatecleanup closure takes the guard by value so unlock-first ordering is preserved on every return path).2. The struct is partitioned by thread role. JS-thread-only state (promise, sink, response refs, abort plumbing, the waiting flags) moved into a named
JsState; HTTP→JS handoff state intoHttpHandoff; what remains directly onFetchTaskletis lock-free shared state (atomics) and the leases handed to the HTTP thread (response_buffer, theAsyncHTTPbox). All fields are now private. A smallParts<'t>split-borrow struct lets lock-held helpers keep access to JS-side state without aliasing&mut self— it compiles to nothing.3. The refcount protocol is named.
ref_countnow starts atinit_exact_refs(2)— one baseline ref per thread, matchingThreadSafeStreamBuffer— instead ofinit()plus a bareref_()later inqueue(). Each release site states which ref it drops:release_js_ref,release_http_ref,release_sink_ref,release_drain_task_ref, and the struct doc carries the full ref table plus the lock invariants (user JS can run while the lock is held —checkServerIdentity, microtask drains, sink re-entry — which is whyignore_datais now an atomic: the GC-finalizer path must stay lock-free).release_http_refdebug-asserts the lock is not held by the releasing thread, since the final release frees the allocation the mutex lives in.4. The buffer handoff is two named methods.
HttpHandoff::merge_result(sticky one-shotcan_stream, metadata-accepted-once, certificate-info carry-over) andHttpHandoff::stage_response_bytes(copy + reset for buffer reuse) replace the inline field dance incallback. The stored result no longer carries itsbodyalias — the alias is asserted on the incoming result and thenNoned, so the'static-widened copy holds no live borrow andis_http2is mirrored into an atomic for the one lock-free reader (skip_chunked_framing).Two support changes outside the fetch module:
GuardedLockis now\!Send(likestd::sync::MutexGuard) and also\!Sync(stricter thanMutexGuard, which isSync where T: Sync; nothing shares a guard across threads) — Darwinos_unfair_lock/ Windows SRWLOCK must be unlocked on the locking thread.jsc::Weak::create_ptris a newunsafe fn(with a documented safety contract) for the one caller that cannot form&mutwhile split borrows are live; the safecreatenow wraps it.Behavior
One deliberate change, found by writing the ref table; everything else is preserved by construction and by test.
Fixed:
process.exit()during a streaming upload leaked the whole in-flight chain (the unbalanced-sink-ref hole CodeRabbit flagged on the first revision).release_at_shutdownonly balanced the two baseline refs; a still-attached request-body sink's ref and anyresume_request_data_streamnode parked in the JS concurrent queue (whoseManagedTasktagrelease_queued_tasks_for_shutdowncannot release) kept the count above zero, so theFetchTasklet⇄Box<AsyncHTTP>⇄ sink ⇄ThreadSafeStreamBufferchain was never reclaimed — LSan reports all of it withBUN_DESTRUCT_VM_ON_EXIT=1. Nowsink_ref_held/queued_drain_tasksrecord those refs andrelease_at_shutdownbalances them (same JS-thread-parked argument as the existinghas_schedule_callbackread), anddealloc_in_flight_for_exitreleases the HTTP-side stream-buffer ref the wayInternalState::resetdoes on the normal path.test/js/web/fetch/fetch-exit-in-flight.test.ts—process.exit()with requests still in flight (dealloc_in_flight_for_exit→release_at_shutdown) had no coverage at all. Three child-process fixtures establish a known in-flight state (mid-download with body pending, pre-metadata, mid streaming upload with the sink attached), then exit with LeakSanitizer enabled (ASAN builds; inert elsewhere): an unbalanced ref fails as exit 23 with the leaked chain in the report, a UAF as a different exit code, a hung exit as the test timeout. The mid-upload test fails without this PR's shutdown fix and passes with it; the other two pin the already-balanced paths.fetch.test.ts,fetch-gzip.test.ts,fetch.stream.test.ts,body.test.ts,body-stream.test.ts,fetch-abort-queued.test.ts,abort-signal-leak.test.tson the patched debug build: 9,914 pass, 2 fail — and both failures reproduce identically on a main debug build in the same environment (one helper-serverConnectionRefused, one GC-timing 5s timeout), verified by rebuilding main into the same build dir and re-running the identical suite. Main fails a strict superset locally.Guarded<T>stores the same mutex and data that were already adjacent fields, the handoff methods are the same code moved, and the split-borrow struct is borrows only. No benchmarks were run since no hot-path code changed shape.Review guide
The commits are staged to be read in order — the big one (
partition FetchTasklet into JsState and HttpHandoff) is rename-only by construction; the behavior-relevant ones (start FetchTasklet refcount at 2,convert mutex to Guarded,make ignore_data atomic) are each small and isolated.