runtime: react to OS memory-pressure signals (opt-in)#31021
Conversation
|
Updated 12:00 PM PT - May 22nd, 2026
✅ @robobun, your commit 375c72c37fceb0730333658434e72d7ce5b66924 passed in 🧪 To try this PR locally: bunx bun-pr 31021That installs a local version of the PR into your bun-31021 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds an opt-in, process-wide OS memory-pressure watcher that triggers JS-thread GC, JSC heap footprint shrinking, and mimalloc cleanup in response to low-memory notifications. Installation is gated by a feature flag, watcher state is tracked in analytics, and three platform backends (Linux PSI polling, macOS libdispatch, Windows kernel notification) forward signals to a unified JS-thread response path. VM lifecycle hooks install and uninstall the watcher, and debug-only test seams expose simulation functions for integration testing. ChangesMemory Pressure Watcher
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Found 8 issues this PR may fix:
🤖 Generated with Claude Code |
|
This PR may be a duplicate of: 1. #30403 - Same feature (OS memory-pressure signal handling), this is the original Zig implementation that this PR ports to Rust. Generated with Claude Code https://claude.ai/code |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/jsc/MemoryPressureWatcher.rs`:
- Around line 738-744: When arm(s) fails after partially initializing the
watcher, tear down the partially initialized state: clear the global STATE
reference, close the Windows notification handle and any libuv handles that were
initialized on the state, and free/destroy the state object before returning.
Concretely, in the failure branch where RegisterWaitForSingleObject/arm(s)
returns false, call whatever routine you use to clear STATE (e.g. set STATE to
null/clear atomic), CloseHandle(notification) (or the equivalent close for the
notification), uv_close or the appropriate close API for the libuv handles
stored on s (e.g. s->async, s->timer), and then free/destroy s (or call the
destructor/cleanup function used elsewhere) so no handles or memory leak remain.
Ensure these cleanup steps reference STATE, notification, and the libuv handle
fields on the watcher state so they match the existing teardown code paths.
- Around line 357-358: The global AtomicBool pending_critical on
MemoryPressureWatcher improperly shares per-event severity across multiple
dispatch enqueues; instead carry severity with each queued work item or replace
pending_critical with an atomic integer/severity field (e.g.,
AtomicU8/AtomicUsize representing the severity enum) and enqueue the actual
severity value so each queued callback reads its own severity; update the
dispatch handlers that write pending_critical and the JS-thread consumer that
reads it to use the per-item severity (or atomically swap/load the encoded
severity) so events are not overwritten or misclassified.
- Around line 252-258: The poll loop in MemoryPressureWatcher.rs currently
treats any negative return from unsafe { libc::poll(...) } as a permanent
failure and breaks the thread; instead, detect EINTR and retry the poll: after
the poll call in the loop (around the poll invocation and the n < 0 check),
fetch the last errno (e.g. via nix::errno::Errno::last() or libc's errno), and
if errno == libc::EINTR continue the loop (skipping the shutdown check early),
otherwise treat it as a real error and break; keep the existing
s.shutdown.load(Ordering::Relaxed) check and only exit on non-EINTR errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7d981506-0990-47aa-bf72-4ab8bd696c92
📒 Files selected for processing (1)
src/jsc/MemoryPressureWatcher.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/jsc/MemoryPressureWatcher.rs (1)
744-750:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPartial initialization leak when first
arm()fails.When
arm(s)returns false at line 744, the function returns early leaving resources leaked:
STATEstill referencessnotificationhandle remains openwakeandrearmuv handles are initialized but never closedStateallocation is never freedClean up before returning:
Proposed fix
if !arm(s) { log!( "RegisterWaitForSingleObject failed (err={}); watcher disabled", GetLastError() ); + STATE.store(core::ptr::null_mut(), Ordering::Release); + unsafe { + // Close uv handles before destroying state. Since they were + // just init'd and never started, close is synchronous-ish but + // we still need the callback dance for correctness. + (*s).closing.store(2, Ordering::Release); + (*s).rearm.close(on_closed_timer); + (*s).wake.close(on_closed_async); + let _ = CloseHandle((*s).notification); + } return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/jsc/MemoryPressureWatcher.rs` around lines 744 - 750, When arm(s) returns false you must clean up all partially-initialized resources before returning: close the notification HANDLE, call uv_close on the wake and rearm uv handles (or schedule their close callbacks) so they are not leaked, free the allocated State instance and clear/reset the global STATE reference, and ensure any other OS resources tied to State are released; update the early-return path in the function that calls arm(s) (referencing arm(s), STATE, notification, wake, rearm, and the State allocation) to perform these cleanup steps before logging and returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/jsc/MemoryPressureWatcher.rs`:
- Around line 744-750: When arm(s) returns false you must clean up all
partially-initialized resources before returning: close the notification HANDLE,
call uv_close on the wake and rearm uv handles (or schedule their close
callbacks) so they are not leaked, free the allocated State instance and
clear/reset the global STATE reference, and ensure any other OS resources tied
to State are released; update the early-return path in the function that calls
arm(s) (referencing arm(s), STATE, notification, wake, rearm, and the State
allocation) to perform these cleanup steps before logging and returning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 664d6074-268a-4bb5-b665-c5bb90055416
📒 Files selected for processing (1)
src/jsc/MemoryPressureWatcher.rs
There was a problem hiding this comment.
Thanks for the EINTR fix in e2400bc — nothing further from the bug hunter. That said, ~860 lines of new cross-platform unsafe FFI + cross-thread lifecycle code (libdispatch barriers, NT threadpool + libuv handle teardown, PSI thread) is more than I'm comfortable auto-approving even behind a flag, so leaving this for a human pass.
Extended reasoning...
Overview
Adds a new ~860-line src/jsc/MemoryPressureWatcher.rs with three platform backends (Linux PSI poll thread, macOS libdispatch source + semaphore drain barrier, Windows CreateMemoryResourceNotification + NT threadpool wait + libuv async/timer), wires install/uninstall into VirtualMachine::init/destroy, adds a feature flag, an analytics counter, debug-only Bun.unsafe test seams, and an integration test. Also a one-line <cassert> include fix in wtf-bindings.cpp and a Zeroable impl for uv_async_t.
Security risks
No user-controlled input reaches this code; the feature is gated on an env var and only the main-thread VM installs it. The real risk surface is memory safety: every backend manually heap-allocates State, stores it in a global AtomicPtr, hands raw pointers to OS-managed callback threads, and destroys it during VM teardown — all under unsafe. The macOS cancel-handler/semaphore barrier and the Windows dual-uv_close countdown are exactly the kind of teardown choreography where a missed ordering becomes a UAF. I didn't find a concrete bug (and the one I previously flagged on the Linux path is fixed), but this is correctness-by-careful-reasoning, not correctness-by-construction.
Level of scrutiny
High. Even though it's off by default, this is hand-rolled cross-thread lifetime management against three different OS callback models, integrated into VM init/destroy. The macOS and Windows backends are only compile-tested locally per the PR description; CI on those platforms is the first real exercise. This is well outside the "simple/mechanical/obvious" bar for auto-approval.
Other factors
- All prior review threads (mine + CodeRabbit's) are resolved; the two CodeRabbit design nits were intentionally matched to the Zig original (#30403) and the author's rationale is reasonable.
- Tests cover the response path via a debug seam and smoke-test install/uninstall, plus a dedicated red/green for the Darwin barrier — good coverage for what's deterministically testable, but real OS-signal paths remain manual-only.
- Build #55957 showed a failure on e2400bc and was retriggered in 7015936; worth confirming that's green before merge.
CI status71/73 test lanes green on the rebased build #56166 (sha 317f9c7). The only two failures are Buildkite "Expired" jobs — macOS aarch64 agents never picked them up, so no tests ran:
This is infrastructure flake, not a code failure:
Across three CI runs, every platform backend (Linux PSI, macOS libdispatch, Windows kernel32+libuv) and the Darwin uninstall-barrier test have passed at least once on their target OS. All review threads resolved. Ready for maintainer review/merge. |
7015936 to
317f9c7
Compare
Rust port of #30403. Adds a native memory-pressure watcher that, when the OS signals low memory, runs a sync JSC GC, schedules shrinkFootprintWhenIdle() (drops JIT code at the next safe point), forces mi_collect(true) to hand mimalloc free segments back to the OS, and bumps analytics::features::memory_pressure so any later crash report carries memory_pressure(N). Off by default behind BUN_FEATURE_FLAG_EXPERIMENTAL_MEMORY_PRESSURE_HANDLER=1. Detection: - Windows: CreateMemoryResourceNotification + RegisterWaitForSingleObject (NT threadpool, WT_EXECUTEONLYONCE); uv_async_send to JS thread; 30s uv_timer holdoff before re-arm (level-triggered). - macOS: dispatch_source_create(DISPATCH_SOURCE_TYPE_MEMORYPRESSURE) on a global utility queue; enqueue_task_concurrent to JS thread. Cancel handler + semaphore barrier so uninstall() blocks until in-flight event handlers drain. - Linux: PSI trigger on /proc/pressure/memory (some 150000 1000000), blocked on POLLPRI by a dedicated thread; 30s holdoff. Gracefully no-ops where the trigger write fails (no PSI / no CAP_SYS_RESOURCE). All three converge on respond() running on the JS thread. Also: add missing <cassert> to wtf-bindings.cpp (unified-build ordering left assert() undeclared).
317f9c7 to
54dacbe
Compare
simulate() calls respond() directly and doesn't need the real watcher installed. With the flag on, a level-triggered OS signal (Windows' LowMemoryResourceNotification on an already-low-memory runner) could run respond() during startup and make the counter 2 instead of the asserted 1.
What does this PR do?
Rust port of #30403.
Adds a Bun-native memory-pressure watcher that, when the OS signals low memory, runs a sync JSC GC, schedules
shrinkFootprintWhenIdle()(drops JIT code at the next safe point), forcesmi_collect(true)to hand mimalloc free segments back to the OS, and bumpsanalytics::features::memory_pressureso any later crash report carriesmemory_pressure(N).Off by default behind
BUN_FEATURE_FLAG_EXPERIMENTAL_MEMORY_PRESSURE_HANDLER=1so it can be A/B'd before becoming default-on. No JS-visible hook.Detection
CreateMemoryResourceNotification(LowMemoryResourceNotification)+RegisterWaitForSingleObject(NT threadpool,WT_EXECUTEONLYONCE); 30 suv_timerholdoff before re-arm since the handle is level-triggered. Same approach upstream WebKit recently took inMemoryPressureHandlerWin.uv_async_senddispatch_source_create(DISPATCH_SOURCE_TYPE_MEMORYPRESSURE, …, WARN | CRITICAL | PROC_LIMIT_*)on a global utility queue. Edge-triggered on transitions, no holdoff needed. Cancel handler + semaphore barrier souninstall()blocks until in-flight event handlers drain.enqueue_task_concurrent/proc/pressure/memory(some 150000 1000000), blocked onPOLLPRIby a parked thread; 30 s holdoff between fires. Gracefully no-ops where the trigger write fails (no PSI / noCAP_SYS_RESOURCE).enqueue_task_concurrentAll three converge on
respond()running on the JS thread.Placement
The Zig original lived in
src/aio/; in Rust this goes insrc/jsc/MemoryPressureWatcher.rssince it needsVirtualMachine,ConcurrentTask,VM::run_gc()/shrink_footprint(), andbun_analytics— all available inbun_jsc, the lowest tier with everything required.Why not
WTF::MemoryPressureHandler?In Bun's JSCOnly WebKit build (
PlatformJSCOnly.cmake),install()is the emptyGenericstub on Apple, sets a flag with no OS hook on Linux (it expects GTK/WPE's UIProcess to calltriggerMemoryPressureEventover IPC), and 60 s polling on Windows. AndreleaseMemory()is a no-op withoutsetLowMemoryHandler, so we'd be supplying the same Bun-side cleanup either way.How did you verify your code works?
bun run rust:check-all— 10/10 (all OS/arch/profile combos)bun bd test test/js/bun/util/memory-pressure.test.ts— 3 pass / 0 fail / 1 skip (macOS-only, on Linux)USE_SYSTEM_BUN=1 bun test …— 0 pass / 4 skip (no false positives)CAP_SYS_RESOURCE(PSI write → error, watcher logsPSI unavailableand skips).Tests use the debug-only
Bun.unsafe.simulateMemoryPressure()seam and aWeakRefto deterministically assert the sync GC ran (heap-size deltas were too noisy — JSC retains block capacity post-collection). Manual end-to-end repro instructions for each platform are in the test file's header comment.macOS and Windows backends are compile-tested locally (
cargo check -p bun_jsc --target aarch64-apple-darwin/--target x86_64-pc-windows-msvc); CI runners on those platforms will exercise the install-smoke + Darwin-barrier tests.Also includes a one-line fix:
src/jsc/bindings/wtf-bindings.cppusesassert()without<cassert>, which fails to compile under the current unified-build grouping on main.