Skip to content

runtime: react to OS memory-pressure signals (opt-in)#31021

Open
robobun wants to merge 6 commits into
mainfrom
farm/424eb8e5/memory-pressure-watcher-rust
Open

runtime: react to OS memory-pressure signals (opt-in)#31021
robobun wants to merge 6 commits into
mainfrom
farm/424eb8e5/memory-pressure-watcher-rust

Conversation

@robobun

@robobun robobun commented May 19, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Rust port of #30403.

Adds a Bun-native memory-pressure watcher that, when the OS signals low memory, runs a sync JSC GC, schedules shrinkFootprintWhenIdle() (drops JIT code at the next safe point), forces mi_collect(true) to hand mimalloc free segments back to the OS, and bumps analytics::features::memory_pressure so any later crash report carries memory_pressure(N).

Off by default behind BUN_FEATURE_FLAG_EXPERIMENTAL_MEMORY_PRESSURE_HANDLER=1 so it can be A/B'd before becoming default-on. No JS-visible hook.

Detection

Platform Mechanism Cross-thread dispatch
Windows CreateMemoryResourceNotification(LowMemoryResourceNotification) + RegisterWaitForSingleObject (NT threadpool, WT_EXECUTEONLYONCE); 30 s uv_timer holdoff before re-arm since the handle is level-triggered. Same approach upstream WebKit recently took in MemoryPressureHandlerWin. uv_async_send
macOS dispatch_source_create(DISPATCH_SOURCE_TYPE_MEMORYPRESSURE, …, WARN | CRITICAL | PROC_LIMIT_*) on a global utility queue. Edge-triggered on transitions, no holdoff needed. Cancel handler + semaphore barrier so uninstall() blocks until in-flight event handlers drain. enqueue_task_concurrent
Linux PSI trigger on /proc/pressure/memory (some 150000 1000000), blocked on POLLPRI by a parked thread; 30 s holdoff between fires. Gracefully no-ops where the trigger write fails (no PSI / no CAP_SYS_RESOURCE). enqueue_task_concurrent

All three converge on respond() running on the JS thread.

Placement

The Zig original lived in src/aio/; in Rust this goes in src/jsc/MemoryPressureWatcher.rs since it needs VirtualMachine, ConcurrentTask, VM::run_gc()/shrink_footprint(), and bun_analytics — all available in bun_jsc, the lowest tier with everything required.

Why not WTF::MemoryPressureHandler?

In Bun's JSCOnly WebKit build (PlatformJSCOnly.cmake), install() is the empty Generic stub on Apple, sets a flag with no OS hook on Linux (it expects GTK/WPE's UIProcess to call triggerMemoryPressureEvent over IPC), and 60 s polling on Windows. And releaseMemory() is a no-op without setLowMemoryHandler, so we'd be supplying the same Bun-side cleanup either way.

How did you verify your code works?

  • bun run rust:check-all — 10/10 (all OS/arch/profile combos)
  • bun bd test test/js/bun/util/memory-pressure.test.ts — 3 pass / 0 fail / 1 skip (macOS-only, on Linux)
  • USE_SYSTEM_BUN=1 bun test … — 0 pass / 4 skip (no false positives)
  • Manually confirmed the Linux install path runs and gracefully degrades without CAP_SYS_RESOURCE (PSI write → error, watcher logs PSI unavailable and skips).

Tests use the debug-only Bun.unsafe.simulateMemoryPressure() seam and a WeakRef to deterministically assert the sync GC ran (heap-size deltas were too noisy — JSC retains block capacity post-collection). Manual end-to-end repro instructions for each platform are in the test file's header comment.

macOS and Windows backends are compile-tested locally (cargo check -p bun_jsc --target aarch64-apple-darwin / --target x86_64-pc-windows-msvc); CI runners on those platforms will exercise the install-smoke + Darwin-barrier tests.


Also includes a one-line fix: src/jsc/bindings/wtf-bindings.cpp uses assert() without <cassert>, which fails to compile under the current unified-build grouping on main.

@robobun

robobun commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 12:00 PM PT - May 22nd, 2026

@robobun, your commit 375c72c37fceb0730333658434e72d7ce5b66924 passed in Build #56897! 🎉


🧪   To try this PR locally:

bunx bun-pr 31021

That installs a local version of the PR into your bun-31021 executable, so you can run:

bun-31021 --bun

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds an opt-in, process-wide OS memory-pressure watcher that triggers JS-thread GC, JSC heap footprint shrinking, and mimalloc cleanup in response to low-memory notifications. Installation is gated by a feature flag, watcher state is tracked in analytics, and three platform backends (Linux PSI polling, macOS libdispatch, Windows kernel notification) forward signals to a unified JS-thread response path. VM lifecycle hooks install and uninstall the watcher, and debug-only test seams expose simulation functions for integration testing.

Changes

Memory Pressure Watcher

Layer / File(s) Summary
Feature flag and analytics instrumentation
src/bun_core/env_var.rs, src/analytics/lib.rs
Adds BUN_FEATURE_FLAG_EXPERIMENTAL_MEMORY_PRESSURE_HANDLER environment variable to gate watcher installation (off by default) and memory_pressure analytics counter at feature index 58 to track memory-pressure events.
MemoryPressureWatcher core module, API, and response logic
src/jsc/MemoryPressureWatcher.rs, src/jsc/lib.rs
Module documentation, public API functions (install/uninstall for production; simulate/test_uninstall_barrier for debug), core JS-thread respond() that logs severity, increments analytics counter, runs synchronous full GC, shrinks JSC footprint, and triggers mimalloc cleanup. Global STATE atomic pointer coordinates cross-thread shutdown. Noop backend for unsupported platforms. Module is exported from crate root.
Linux/Android PSI polling backend
src/jsc/MemoryPressureWatcher.rs
Dedicated polling thread opens /proc/pressure/memory, writes PSI trigger, polls for POLLPRI events with periodic timeouts, enqueues JS-thread callbacks via concurrent task queue, throttles repeat responses via HOLDOFF_MS, and uninstalls by setting shutdown flag, joining thread, and closing fd.
macOS libdispatch backend
src/jsc/MemoryPressureWatcher.rs
Installs DISPATCH_SOURCE_TYPE_MEMORYPRESSURE, computes critical/warn severity from event, enqueues JS callbacks on main dispatch queue. Uninstall cancels source and blocks via semaphore "drained" barrier until all in-flight handlers complete. Debug-only instrumentation (test_uninstall_barrier) uses extra dispatch sources and reset events to deterministically exercise the uninstall race window.
Windows kernel32/libuv backend
src/jsc/MemoryPressureWatcher.rs
Creates memory notification via CreateMemoryResourceNotification, registers threadpool wait via RegisterWaitForSingleObject, forwards events into JS thread using uv_async_send, performs respond() and re-arms with a libuv Timer respecting holdoff throttling. Defers state destruction until both uv handle close callbacks complete via atomic close counter.
VM lifecycle integration and FFI support
src/jsc/VirtualMachine.rs, src/libuv_sys/libuv.rs
Main-thread VM init calls install_on_event_loop(vm) during initialization and uninstall() during destroy for symmetric cleanup. Marks uv_async_t as Zeroable to permit safe zero-initialization for FFI use by Windows backend.
Debug test seams and host functions
src/runtime/api/UnsafeObject.rs
Registers two debug-only host functions: simulate_memory_pressure() directly invokes the watcher response path and returns the analytics counter; test_memory_pressure_uninstall_barrier() validates macOS uninstall barrier ordering. These are exposed to JS via Bun.unsafe.* for integration testing.
Integration test suite
test/js/bun/util/memory-pressure.test.ts
Concurrent test suite verifies watcher response increments analytics counter, that GC clears WeakRef targets, validates install/uninstall via [memorypressure]-scoped debug logs with OS-specific expectations (Linux: PSI installed or unavailable; macOS/Windows: installed; others: no install logs), and confirms macOS uninstall barrier blocks until in-flight handlers drain.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main change: adding OS memory-pressure signal handling with opt-in control.
Description check ✅ Passed The description covers both required sections comprehensively: explains what the PR does (implementation details, detection mechanisms, placement, rationale) and verification methods (testing, manual verification, CI results).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Found 8 issues this PR may fix:

  1. Bun appears to delay GC in low-pressure Fastify + mongodb workload, causing slow RSS/Timeout growth #30261 - Directly describes delayed GC in a 512 MiB container causing slow RSS growth to OOM; memory-pressure signals would trigger GC before hitting the cgroup limit
  2. Memory leak in Next.js SSR under bun --bun next start — JSC GC fails to reclaim heap after concurrent requests #29267 - JSC GC fails to reclaim heap between Next.js SSR request batches where Node's V8 GC succeeds; pressure-triggered GC could force reclamation
  3. Native RSS grows linearly under sustained AWS SDK v3 Kinesis GetRecords; identical Node 22 workload is flat #30415 - RSS grows linearly while JS heap stays stable under sustained AWS SDK Kinesis workload in containers; mimalloc cleanup under pressure could help
  4. Moving from Node to Bun spikes container CPU and memory usage until it crashes #17723 - GKE container memory spikes to limit (~1.2GB) and restarts when migrating from Node to Bun; OS memory-pressure signals could prevent hitting the cgroup ceiling
  5. Memory leak in bun + express + docker #19930 - Express in Docker shows significantly more retained objects than locally, suggesting GC doesn't run aggressively enough in container environments
  6. Memory leak since miggration from node 20 to bun 1.1.43 of a nextjs website #16339 - Next.js in Kubernetes (GKE) with linear memory growth in cgroup-limited container despite externalized Redis cache
  7. Memory leak on Nuxt SSR #14389 - Nuxt SSR in Docker with memory growing under load in a container environment
  8. Likely memoryleak inside bun runtime on service http requests #14065 - Long-running HTTP server in container where heap is stable but RSS grows, matching mimalloc page retention pattern

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #30261
Fixes #29267
Fixes #30415
Fixes #17723
Fixes #19930
Fixes #16339
Fixes #14389
Fixes #14065

🤖 Generated with Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of: 1. #30403 - Same feature (OS memory-pressure signal handling), this is the original Zig implementation that this PR ports to Rust. Generated with Claude Code https://claude.ai/code

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/jsc/MemoryPressureWatcher.rs`:
- Around line 738-744: When arm(s) fails after partially initializing the
watcher, tear down the partially initialized state: clear the global STATE
reference, close the Windows notification handle and any libuv handles that were
initialized on the state, and free/destroy the state object before returning.
Concretely, in the failure branch where RegisterWaitForSingleObject/arm(s)
returns false, call whatever routine you use to clear STATE (e.g. set STATE to
null/clear atomic), CloseHandle(notification) (or the equivalent close for the
notification), uv_close or the appropriate close API for the libuv handles
stored on s (e.g. s->async, s->timer), and then free/destroy s (or call the
destructor/cleanup function used elsewhere) so no handles or memory leak remain.
Ensure these cleanup steps reference STATE, notification, and the libuv handle
fields on the watcher state so they match the existing teardown code paths.
- Around line 357-358: The global AtomicBool pending_critical on
MemoryPressureWatcher improperly shares per-event severity across multiple
dispatch enqueues; instead carry severity with each queued work item or replace
pending_critical with an atomic integer/severity field (e.g.,
AtomicU8/AtomicUsize representing the severity enum) and enqueue the actual
severity value so each queued callback reads its own severity; update the
dispatch handlers that write pending_critical and the JS-thread consumer that
reads it to use the per-item severity (or atomically swap/load the encoded
severity) so events are not overwritten or misclassified.
- Around line 252-258: The poll loop in MemoryPressureWatcher.rs currently
treats any negative return from unsafe { libc::poll(...) } as a permanent
failure and breaks the thread; instead, detect EINTR and retry the poll: after
the poll call in the loop (around the poll invocation and the n < 0 check),
fetch the last errno (e.g. via nix::errno::Errno::last() or libc's errno), and
if errno == libc::EINTR continue the loop (skipping the shutdown check early),
otherwise treat it as a real error and break; keep the existing
s.shutdown.load(Ordering::Relaxed) check and only exit on non-EINTR errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d981506-0990-47aa-bf72-4ab8bd696c92

📥 Commits

Reviewing files that changed from the base of the PR and between 8b36f3f and 727331e.

📒 Files selected for processing (1)
  • src/jsc/MemoryPressureWatcher.rs

Comment thread src/jsc/MemoryPressureWatcher.rs
Comment thread src/jsc/MemoryPressureWatcher.rs
Comment thread src/jsc/MemoryPressureWatcher.rs
Comment thread src/jsc/MemoryPressureWatcher.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/jsc/MemoryPressureWatcher.rs (1)

744-750: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Partial initialization leak when first arm() fails.

When arm(s) returns false at line 744, the function returns early leaving resources leaked:

  • STATE still references s
  • notification handle remains open
  • wake and rearm uv handles are initialized but never closed
  • State allocation is never freed

Clean up before returning:

Proposed fix
         if !arm(s) {
             log!(
                 "RegisterWaitForSingleObject failed (err={}); watcher disabled",
                 GetLastError()
             );
+            STATE.store(core::ptr::null_mut(), Ordering::Release);
+            unsafe {
+                // Close uv handles before destroying state. Since they were
+                // just init'd and never started, close is synchronous-ish but
+                // we still need the callback dance for correctness.
+                (*s).closing.store(2, Ordering::Release);
+                (*s).rearm.close(on_closed_timer);
+                (*s).wake.close(on_closed_async);
+                let _ = CloseHandle((*s).notification);
+            }
             return;
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/jsc/MemoryPressureWatcher.rs` around lines 744 - 750, When arm(s) returns
false you must clean up all partially-initialized resources before returning:
close the notification HANDLE, call uv_close on the wake and rearm uv handles
(or schedule their close callbacks) so they are not leaked, free the allocated
State instance and clear/reset the global STATE reference, and ensure any other
OS resources tied to State are released; update the early-return path in the
function that calls arm(s) (referencing arm(s), STATE, notification, wake,
rearm, and the State allocation) to perform these cleanup steps before logging
and returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/jsc/MemoryPressureWatcher.rs`:
- Around line 744-750: When arm(s) returns false you must clean up all
partially-initialized resources before returning: close the notification HANDLE,
call uv_close on the wake and rearm uv handles (or schedule their close
callbacks) so they are not leaked, free the allocated State instance and
clear/reset the global STATE reference, and ensure any other OS resources tied
to State are released; update the early-return path in the function that calls
arm(s) (referencing arm(s), STATE, notification, wake, rearm, and the State
allocation) to perform these cleanup steps before logging and returning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 664d6074-268a-4bb5-b665-c5bb90055416

📥 Commits

Reviewing files that changed from the base of the PR and between 727331e and e2400bc.

📒 Files selected for processing (1)
  • src/jsc/MemoryPressureWatcher.rs

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the EINTR fix in e2400bc — nothing further from the bug hunter. That said, ~860 lines of new cross-platform unsafe FFI + cross-thread lifecycle code (libdispatch barriers, NT threadpool + libuv handle teardown, PSI thread) is more than I'm comfortable auto-approving even behind a flag, so leaving this for a human pass.

Extended reasoning...

Overview

Adds a new ~860-line src/jsc/MemoryPressureWatcher.rs with three platform backends (Linux PSI poll thread, macOS libdispatch source + semaphore drain barrier, Windows CreateMemoryResourceNotification + NT threadpool wait + libuv async/timer), wires install/uninstall into VirtualMachine::init/destroy, adds a feature flag, an analytics counter, debug-only Bun.unsafe test seams, and an integration test. Also a one-line <cassert> include fix in wtf-bindings.cpp and a Zeroable impl for uv_async_t.

Security risks

No user-controlled input reaches this code; the feature is gated on an env var and only the main-thread VM installs it. The real risk surface is memory safety: every backend manually heap-allocates State, stores it in a global AtomicPtr, hands raw pointers to OS-managed callback threads, and destroys it during VM teardown — all under unsafe. The macOS cancel-handler/semaphore barrier and the Windows dual-uv_close countdown are exactly the kind of teardown choreography where a missed ordering becomes a UAF. I didn't find a concrete bug (and the one I previously flagged on the Linux path is fixed), but this is correctness-by-careful-reasoning, not correctness-by-construction.

Level of scrutiny

High. Even though it's off by default, this is hand-rolled cross-thread lifetime management against three different OS callback models, integrated into VM init/destroy. The macOS and Windows backends are only compile-tested locally per the PR description; CI on those platforms is the first real exercise. This is well outside the "simple/mechanical/obvious" bar for auto-approval.

Other factors

  • All prior review threads (mine + CodeRabbit's) are resolved; the two CodeRabbit design nits were intentionally matched to the Zig original (#30403) and the author's rationale is reasonable.
  • Tests cover the response path via a debug seam and smoke-test install/uninstall, plus a dedicated red/green for the Darwin barrier — good coverage for what's deterministically testable, but real OS-signal paths remain manual-only.
  • Build #55957 showed a failure on e2400bc and was retriggered in 7015936; worth confirming that's green before merge.

@robobun

robobun commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author

CI status

71/73 test lanes green on the rebased build #56166 (sha 317f9c7). The only two failures are Buildkite "Expired" jobs — macOS aarch64 agents never picked them up, so no tests ran:

  • darwin-14-aarch64-test-bun → Expired
  • darwin-26-aarch64-test-bun → Expired

This is infrastructure flake, not a code failure:

  • darwin-14-x64-test-bun passed in this build (23 min) — exercises the same libdispatch backend and the macOS-specific uninstall-barrier test.
  • darwin-26-aarch64-test-bun passed in build #55979 (17 min) with identical code.
  • All Linux (debian/ubuntu/alpine × x64/aarch64/asan/musl) and Windows (2019-x64/11-aarch64) lanes pass.
  • darwin-14-aarch64 also expired in build #55957 — recurring macOS aarch64 agent availability.

Across three CI runs, every platform backend (Linux PSI, macOS libdispatch, Windows kernel32+libuv) and the Darwin uninstall-barrier test have passed at least once on their target OS. All review threads resolved. Ready for maintainer review/merge.

@robobun robobun force-pushed the farm/424eb8e5/memory-pressure-watcher-rust branch from 7015936 to 317f9c7 Compare May 19, 2026 17:47
robobun and others added 4 commits May 22, 2026 16:28
Rust port of #30403. Adds a native memory-pressure watcher that, when
the OS signals low memory, runs a sync JSC GC, schedules
shrinkFootprintWhenIdle() (drops JIT code at the next safe point),
forces mi_collect(true) to hand mimalloc free segments back to the OS,
and bumps analytics::features::memory_pressure so any later crash
report carries memory_pressure(N).

Off by default behind BUN_FEATURE_FLAG_EXPERIMENTAL_MEMORY_PRESSURE_HANDLER=1.

Detection:
- Windows: CreateMemoryResourceNotification + RegisterWaitForSingleObject
  (NT threadpool, WT_EXECUTEONLYONCE); uv_async_send to JS thread; 30s
  uv_timer holdoff before re-arm (level-triggered).
- macOS: dispatch_source_create(DISPATCH_SOURCE_TYPE_MEMORYPRESSURE) on a
  global utility queue; enqueue_task_concurrent to JS thread. Cancel
  handler + semaphore barrier so uninstall() blocks until in-flight
  event handlers drain.
- Linux: PSI trigger on /proc/pressure/memory (some 150000 1000000),
  blocked on POLLPRI by a dedicated thread; 30s holdoff. Gracefully
  no-ops where the trigger write fails (no PSI / no CAP_SYS_RESOURCE).

All three converge on respond() running on the JS thread.

Also: add missing <cassert> to wtf-bindings.cpp (unified-build ordering
left assert() undeclared).
@robobun robobun force-pushed the farm/424eb8e5/memory-pressure-watcher-rust branch from 317f9c7 to 54dacbe Compare May 22, 2026 16:37
Comment thread test/js/bun/util/memory-pressure.test.ts
simulate() calls respond() directly and doesn't need the real watcher
installed. With the flag on, a level-triggered OS signal (Windows'
LowMemoryResourceNotification on an already-low-memory runner) could
run respond() during startup and make the counter 2 instead of the
asserted 1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant