GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#30725
GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#30725robobun wants to merge 3 commits into
Conversation
|
Updated 1:05 AM PT - Jun 10th, 2026
❌ @robobun, your commit 2248344 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 30725That installs a local version of the PR into your bun-30725 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds GC environment variables, exposes a VM FFI for async full collection, rewrites GarbageCollectionController to detect idle state and perform capped async Full GCs for old-gen reclamation, and adds a test that validates heap shrinkage during idle. ChangesIdle Garbage Collection via Async Full Collection
Suggested reviewers:
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bun_core/env_var.rs`:
- Around line 92-95: The documentation for BUN_GC_TIMER_INTERVAL on the
GarbageCollectionController is outdated: it says the controller fires a single
Full GC then drops to slow mode, but the current logic can request up to two
Full GCs before switching to the slow 30s interval. Update the comment above
new!(pub BUN_GC_TIMER_INTERVAL: unsigned, "BUN_GC_TIMER_INTERVAL", {}) to
describe the two-step behavior (fast repeating ticks, up to two Full GC requests
on non-growing ticks, then switch to the fixed 30s slow interval) and ensure the
wording references the 30 non-growing tick threshold and the possibility of two
Full GCs before slow mode.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 62b74ca3-dfe4-42d5-8fc1-caa5f606fc97
📒 Files selected for processing (6)
src/bun_core/env_var.rssrc/jsc/GarbageCollectionController.rssrc/jsc/VM.rssrc/jsc/bindings/bindings.cppsrc/jsc/bindings/headers.htest/js/bun/gc/gc-controller-idle-full.test.ts
|
Found 3 issues this PR may fix:
🤖 Generated with Claude Code |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
There was a problem hiding this comment.
I didn't find any bugs, but this rewrites the GC controller's idle-detection state machine and changes the growth/shrink semantics in process_gc_timer_with_heap_size — core runtime heuristics that affect every process — so it warrants a human look.
Extended reasoning...
Overview
This PR ports #29280 to the Rust source tree. It adds an idle_full_gcs_fired counter to GarbageCollectionController, substantially rewrites on_gc_repeating_timer to introduce a "reduction mode" that fires up to 2 explicit collectAsync(CollectionScope::Full) calls at the fast→slow transition, and changes process_gc_timer_with_heap_size so that only heap growth (not any change) signals activity. It also adds a trivial C++ binding (JSC__VM__collectAsyncFull), three env_var declarations, switches the controller's env reads from vm.env_loader_opt() to bun_core::env_var, and includes an integration test.
Security risks
None. No untrusted input handling, auth, or data exposure paths are touched. The new env vars are debug/tuning knobs read at init and follow the existing pattern.
Level of scrutiny
High. The GC repeating timer runs in every Bun process and the state machine here has several interacting pieces: the 30-tick stable counter, the new idle_full_gcs_fired cap, the 1 MB freed threshold, and the changed != prev → > prev activity test (which means heap shrinkage no longer kicks the controller back to fast mode or reschedules the one-shot timer). The repeating-timer callback also no longer updates gc_last_heap_size via perform_gc(), which the diff acknowledges and compensates for in the Pending arm. These are all reasonable and well-commented, but they're heuristic design choices with process-wide perf implications that a human familiar with the JSC GC model should sign off on.
Other factors
- The design mirrors a previously-landed Zig PR (#29280) and V8's MemoryReducer convergence rule, which lends confidence.
- A regression test is included and the PR description shows before/after verification.
- The robobun CI failures are pre-existing lint warnings in unrelated files (
hw_timer.rs,Blob.rs,server/mod.rs, etc.), not introduced here. - coderabbit's only finding (doc wording) was addressed in ecab4ad.
- github-actions flagged two potentially overlapping open PRs (#28696, #27366) targeting the older Zig source — worth a human confirming this supersedes them.
|
CI status (latest: build #61679 at 2248344, maintainer's merge of current main — now settled): Zero test failures. Every red job is agent infrastructure:
Earlier unrelated breaks are resolved on main:
|
ecab4ad to
9f2312b
Compare
There was a problem hiding this comment.
I didn't find any bugs, but this reworks the GC controller's scheduling state machine (idle Full-GC reduction mode, !=→> heap-comparison semantics, perform_gc()→collect_async() in the repeating timer, BUN_GC_TIMER_DISABLE presence→truthy check) — runtime-wide GC behavior is worth a human look.
Extended reasoning...
Overview
This PR ports Zig PR #29280 to the Rust GarbageCollectionController, adding an idle "reduction mode" that fires up to 2 explicit collectAsync(CollectionScope::Full) calls before transitioning to the 30s slow interval. It touches src/jsc/GarbageCollectionController.rs (state machine, new idle_full_gcs_fired field, rewritten on_gc_repeating_timer), src/jsc/VM.rs + src/jsc/bindings/bindings.cpp + headers.h (new JSC__VM__collectAsyncFull binding), src/bun_core/env_var.rs (three new env-var declarations), and adds an integration test.
Security risks
None identified. The change is internal GC scheduling; no user input parsing, auth, crypto, or data exposure surface. The new env vars are debug/tuning knobs read via the existing typed env_var machinery.
Level of scrutiny
High. This is process-wide GC scheduling that runs in every Bun process. Beyond the headline feature, several subtle semantic shifts ride along:
process_gc_timer_with_heap_size:this_heap_size != prev→this_heap_size > previn bothRunOnNextTickandPendingarms, plus a new< prevbranch that lowersgc_last_heap_sizewithout rescheduling. This changes the existing allocation-driven path, not just the new idle path.on_gc_repeating_timer: previously calledthis.perform_gc()(which updatesgc_last_heap_size); now callsvm.collect_async()directly, sogc_last_heap_sizeis no longer updated from the repeating timer. The PR's new< prevbranch inPendingcompensates, but the interaction between the two timers andgc_last_heap_sizevsgc_last_heap_size_on_repeating_timeris delicate.BUN_GC_TIMER_DISABLE: wasenv.has(...)(presence check, soBUN_GC_TIMER_DISABLE=0disabled the timer); now a truthy boolean check (so=0no longer disables). Minor but a user-visible behavior change.- The reduction-mode state machine has multiple reset points for
idle_full_gcs_firedandheap_size_didnt_change_for_repeating_timer_ticks_countacross two callbacks; the comments are thorough but the cross-callback invariants are non-obvious.
Other factors
The new test passed on all CI lanes including ASAN, and CodeRabbit's one doc nit was addressed. The PR is well-reasoned and references V8's MemoryReducer convergence rule. However, GC timing changes can have hard-to-spot regressions (latency spikes, memory not reclaimed under specific allocation patterns) that a single fixture test won't catch, and the collateral changes to the existing allocation-driven path warrant a maintainer's eye.
There was a problem hiding this comment.
I didn't find any bugs, but this rewrites the process-wide GC scheduling state machine (reduction mode, the !=→> activity heuristic, env-var sourcing) in ways that affect every Bun process, so it's worth a human look before merging.
Extended reasoning...
Overview
This PR ports #29280 to the Rust tree: it rewrites on_gc_repeating_timer to fire up to two explicit collectAsync(CollectionScope::Full) calls at the fast→slow transition, adds a new idle_full_gcs_fired reduction-mode state, changes process_gc_timer_with_heap_size to treat only heap growth (not any change) as activity, switches init() to read BUN_GC_TIMER_* / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS via bun_core::env_var instead of vm.env_loader_opt(), adds the JSC__VM__collectAsyncFull C++/Rust binding, declares the three env vars in env_var.rs, and adds an integration test.
Security risks
None. The change is internal GC timing logic; the only user-facing inputs are pre-existing tuning env vars, and the new binding just forwards to heap.collectAsync(Full) under the JSLock.
Level of scrutiny
High. This is runtime-wide GC scheduling that runs in every Bun process. The repeating-timer callback is essentially rewritten with a new state machine, and the != → > change in process_gc_timer_with_heap_size alters when the allocation-driven path reschedules / lowers the baseline. A subtle mistake here could regress either CPU (over-collecting) or RSS (under-collecting) across the board. While it mirrors an already-landed Zig PR and has a passing end-to-end test, the port-specific deviations (env-var sourcing, the one-shot-eval test workaround) and the cross-path interaction between idle_full_gcs_fired resets in process_gc_timer_with_heap_size vs the reduction-mode branch in on_gc_repeating_timer are the kind of heuristic tuning a human reviewer should sanity-check.
Other factors
The bug-hunting system found no issues, CodeRabbit's only finding (a doc-comment wording nit) was addressed in ecab4ad, the new test passed on all CI lanes including ASAN, and the one remaining CI failure is a known-flaky S3 leak unrelated to this diff. No CODEOWNERS entry covers these paths. Given the scope and the process-wide blast radius, deferring rather than approving.
There was a problem hiding this comment.
No bugs found, but this rewrites the GC controller's idle state machine and changes growth-vs-change semantics in process_gc_timer_with_heap_size — core runtime behavior that affects every process, so worth a human look even though it's a port of #29280.
Extended reasoning...
Overview
This PR modifies Bun's GarbageCollectionController to fire up to 2 explicit async Full GCs at the fast→slow idle transition, addressing old-gen garbage that JSC's Eden-only collections leave behind. It's a Rust port of #29280 (Zig). Files touched: GarbageCollectionController.rs (substantial state-machine rewrite in on_gc_repeating_timer plus new idle_full_gcs_fired field and reduction-mode logic), VM.rs / bindings.cpp / headers.h (new collectAsyncFull FFI binding — straightforward), env_var.rs (registers three GC tuning env vars), and a new subprocess test.
Security risks
None. This is purely GC scheduling — no auth, crypto, permissions, untrusted input parsing, or network surface. The new env vars are debug/tuning knobs read from the process environment.
Level of scrutiny
Moderate-to-high. GC scheduling is performance-critical runtime code that runs in every Bun process. Beyond the headline change, the PR also:
- Changes
process_gc_timer_with_heap_sizefrom!=-triggered to>-only (growth) for bothRunOnNextTickandPendingstates, with a new shrink branch that lowers the baseline without rescheduling. This is well-commented but is a semantic change to the existing allocation-driven path, not just an idle addition. - Replaces
perform_gc()at the top ofon_gc_repeating_timerwith a directblock_bytes_allocated()read + trailingcollect_async(), sogc_last_heap_sizeis no longer updated from the repeating timer (the comment in the newPendingshrink branch references this). - Switches
BUN_GC_TIMER_DISABLEfrom presence-check (env.has) to boolean parse, soBUN_GC_TIMER_DISABLE=0now means enabled where previously any value disabled. Likely intentional, but a behavior change.
These all look correct and are carefully justified in code comments, but the interaction between the two timer paths and the new reduction state warrants a maintainer's eye.
Other factors
CI is green on all lanes (the only red is an expired darwin agent, per robobun's status comment), the new test passed everywhere including ASAN, and this mirrors an already-landed Zig change. CodeRabbit's only finding (doc-comment wording) was addressed in ecab4ad. No outstanding human reviewer comments. I'm deferring solely on the complexity/criticality axis, not because anything looks wrong.
|
@robobun Performance implications of this in a release build? |
|
Measured on release builds (
Interpretation:
|
|
Can you share your benchmarking code? Maybe make a new repo like robobun/bun-30725-bench so that I can SSH into a powerful linux box, clone it, run a single script, and get more useful numbers. please! |
|
Benchmark harness is up: https://github.com/robobun/bun-30725-bench git clone https://github.com/robobun/bun-30725-bench
cd bun-30725-bench
./run.sh
The driver samples the target's Smoke-tested end-to-end here (shortened windows, capped container); the numbers in my previous comment came from the same scripts. |
|
Expanded https://github.com/robobun/bun-30725-bench to 53 scenarios (same one-script runner,
All 53 smoke-tested end-to-end. One note: I couldn't find |
|
What are the results of those benchmarks in your container? |
|
Ran all 53 scenarios in this container (full tables: sample-results/container-2026-06-02). Binaries from the same tree (main @ 61bd997) ± the PR diff. Settings: RUNS=1, IDLE=60 s, LOAD=8 s, CONC=16. Caveat: core-capped noisy container — single-run RPS/RTT swings ±20-30% in both directions; CPU-during-idle and RSS columns are the reliable ones. Memory (the point of the change):
One nuance visible in the raw data: the plain-object 40 MB burst dropped the JS heap to 0.21 MB but OS page return lagged in that particular run (RSS ~flat at 60 s; my earlier 40 s-idle run of the same shape returned to 35 MB). Buffer-backed and repeated-burst variants returned pages fully. Page decommit timing varies; heapMB is the direct effect of the Full GC. CPU:
Throughput/e2e: 25 oneshots landed within noise. Four rows that looked slow in the single-run table (json-roundtrip +17%, url-churn +34%, alloc-promises, spawn-chain) were re-run 3× interleaved: medians flipped to parity or PR-faster (json-roundtrip: base 6418 ms vs PR 5649 ms) — the originals ran an hour apart under different container load. Server RTT p50/p99 before/after idle: parity within the same noise band, swings in both directions. Summary: memory clearly down where there's garbage to reclaim, CPU flat modulo the bounded one-time mark, RTT/e2e unchanged — matching the smaller earlier run. For trustworthy RPS/RTT numbers this wants your quiet box: |
…ow interval Rust port of #29280. The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle JSC keeps picking Eden because Heap::updateAllocationLimits ratchets m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires. At the fast->slow transition, fire one explicit collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent - same convergence rule as V8's MemoryReducer. If allocation resumes mid-reduction, drop back to fast. No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active->idle transition. Also routes the BUN_GC_TIMER_* env reads through bun_core::env_var (process-env backed) instead of vm.transpiler.env, which is still null when init() runs from ensure_waker(). New binding: JSC__VM__collectAsyncFull -> heap.collectAsync(CollectionScope::Full).
908b782 to
b040223
Compare
Rust port of #29280.
The repeating timer's
collectAsync()lets JSC pick Eden vs Full. At idle JSC keeps picking Eden becauseHeap::updateAllocationLimitsratchetsm_maxHeapSizeon every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires.Benchmark from #29280 on Claude Code:
fullGC()at steady state reclaims ~103 MB;gcAndSweep~141 MB — that is the dead old-gen + jettison-eligible JIT code this timer was leaving on the table.Change: at the fast→slow transition, fire one explicit
collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent — same convergence rule as V8'sMemoryReducer(kRun→kDonewhenstarted_gcs >= 2 || freed <= 1MB). If allocation resumes mid-reduction, drop back to fast.No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active→idle transition.
New binding:
JSC__VM__collectAsyncFull→heap.collectAsync(JSC::CollectionScope::Full).Port-specific changes vs #29280:
BUN_GC_TIMER_*/BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESSreads go throughbun_core::env_var(process-environment backed) rather thanvm.env_loader_opt(). In the Rust portinit()runs fromensure_waker()beforeTranspiler::inithas populatedvm.transpiler.env, so the loader path always returnedNoneand these knobs were silently ignored. The Zig PR made the equivalent change.tempDirinstead ofbun -e. The Rust port'sis_one_shot_eval_invocation()setsnumberOfGCMarkers=1for-e, which stalls the concurrent collector while the mutator is parked inBun.sleep, so the async Full request never reaches its stop-the-world phase. Running as a file uses the normal multi-threaded GC configuration (same as a long-running server, which is what this change targets).Verification:
Heap drops from ~40 MB to ~1 MB once the idle Full GC reclaims the promoted arrays.