GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#29280
GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#29280Jarred-Sumner wants to merge 15 commits into
Conversation
|
Updated 2:18 PM PT - May 12th, 2026
❌ @alii, your commit 3fb4b84 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 29280That installs a local version of the PR into your bun-29280 --bun |
efe9f1b to
8ba592c
Compare
|
Found 2 issues this PR may fix:
🤖 Generated with Claude Code |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new VM API/FFI Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/bun.js/event_loop/GarbageCollectionController.zig (1)
142-155:⚠️ Potential issue | 🟠 MajorKeep the 30-tick Full-GC trigger disabled once the controller is already on the slow interval.
After
updateGCRepeatTimer(.slow)clears the stability counter, this branch starts counting again on the 30s cadence. Thirty slow ticks later, Line 150 fires anothercollectAsyncFull(), so one idle stretch can keep re-entering the reducer every ~15 minutes instead of stopping after the intended 1–2 convergence GCs. Gate this threshold ongc_repeating_timer_fastso it only runs during the fast→slow transition.Suggested guard
- if (prev_heap_size == current) { - this.heap_size_didnt_change_for_repeating_timer_ticks_count +|= 1; - if (this.heap_size_didnt_change_for_repeating_timer_ticks_count >= 30) { - // 30s of stable Eden GCs. collectAsync() never escalates to Full - // here because Heap::updateAllocationLimits ratchets m_maxHeapSize - // on every Eden, so the 1/3 promotion ratio decays instead of - // crossing. Fire an explicit Full so old-gen + age-based CodeBlock - // jettison run before we go to the 30s interval. - this.idle_full_gcs_fired = 1; - vm.collectAsyncFull(); - } - } else { + if (this.gc_repeating_timer_fast and prev_heap_size == current) { + this.heap_size_didnt_change_for_repeating_timer_ticks_count +|= 1; + if (this.heap_size_didnt_change_for_repeating_timer_ticks_count >= 30) { + // 30s of stable Eden GCs. collectAsync() never escalates to Full + // here because Heap::updateAllocationLimits ratchets m_maxHeapSize + // on every Eden, so the 1/3 promotion ratio decays instead of + // crossing. Fire an explicit Full so old-gen + age-based CodeBlock + // jettison run before we go to the 30s interval. + this.idle_full_gcs_fired = 1; + vm.collectAsyncFull(); + } + } else if (prev_heap_size != current) { this.heap_size_didnt_change_for_repeating_timer_ticks_count = 0; this.updateGCRepeatTimer(.fast); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/bun.js/event_loop/GarbageCollectionController.zig` around lines 142 - 155, The 30-tick idle Full-GC should only run while the controller is on the fast repeating timer; modify the prev_heap_size == current branch in GarbageCollectionController to only increment/check heap_size_didnt_change_for_repeating_timer_ticks_count and trigger vm.collectAsyncFull() when gc_repeating_timer_fast is true (i.e., gate the entire 30-tick/idle_full_gcs_fired logic on this.gc_repeating_timer_fast). Ensure updateGCRepeatTimer(.slow) still resets heap_size_didnt_change_for_repeating_timer_ticks_count as it does now so counting stops once we switch to slow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 142-155: The 30-tick idle Full-GC should only run while the
controller is on the fast repeating timer; modify the prev_heap_size == current
branch in GarbageCollectionController to only increment/check
heap_size_didnt_change_for_repeating_timer_ticks_count and trigger
vm.collectAsyncFull() when gc_repeating_timer_fast is true (i.e., gate the
entire 30-tick/idle_full_gcs_fired logic on this.gc_repeating_timer_fast).
Ensure updateGCRepeatTimer(.slow) still resets
heap_size_didnt_change_for_repeating_timer_ticks_count as it does now so
counting stops once we switch to slow.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4144b5ae-aef1-4bdd-b25c-00741a9de1eb
📥 Commits
Reviewing files that changed from the base of the PR and between ccbaed9 and efe9f1b3925dc2983078f777cca8d4800d4107ed.
📒 Files selected for processing (4)
src/bun.js/bindings/VM.zigsrc/bun.js/bindings/bindings.cppsrc/bun.js/bindings/headers.hsrc/bun.js/event_loop/GarbageCollectionController.zig
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Line 27: The field idle_full_gcs_fired in the GarbageCollectionController
struct is internal mutable state and must be made private by renaming it with
the Zig private-field prefix; change the declaration to use `#idle_full_gcs_fired`
and update all internal references inside GarbageCollectionController (any
reads/writes that currently use idle_full_gcs_fired) to use `#idle_full_gcs_fired`
so the field follows the private-field convention.
- Around line 122-157: Reset the idle-reduction state as soon as any non-idle
path observes activity: clear idle_full_gcs_fired and
heap_size_didnt_change_for_repeating_timer_ticks_count (e.g., set both to 0)
whenever you switch the controller back to .fast, schedule a GC, or call
performGC() from outside the idle-reducer path (places that call
updateGCRepeatTimer(.fast), processGCTimerWithHeapSize(), or performGC()). To
make this robust, add a small helper (e.g., cancelIdleReduction()) that zeroes
idle_full_gcs_fired and heap_size_didnt_change_for_repeating_timer_ticks_count
and invoke it from those branches so resumed allocation won't be handled by the
idle-reduction logic on the next tick.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a9399947-784d-4f27-85c8-7f8ae01d9f95
📥 Commits
Reviewing files that changed from the base of the PR and between efe9f1b3925dc2983078f777cca8d4800d4107ed and 8ba592c.
📒 Files selected for processing (4)
src/bun.js/bindings/VM.zigsrc/bun.js/bindings/bindings.cppsrc/bun.js/bindings/headers.hsrc/bun.js/event_loop/GarbageCollectionController.zig
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 118-122: In the repeating-timer path (the code using
bunVM().jsc_vm.blockBytesAllocated()), do not update gc_last_heap_size here—only
update gc_last_heap_size_on_repeating_timer and use prev_heap_size from
gc_last_heap_size_on_repeating_timer for comparisons; remove or revert the
assignment to gc_last_heap_size so processGCTimerWithHeapSize() (and its
.run_on_next_tick/.pending/.scheduled fast-path check this_heap_size > prev * 2)
keeps ownership of gc_last_heap_size for the allocation-driven path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eaefa315-359d-4f8f-9437-a317549987ee
📒 Files selected for processing (1)
src/bun.js/event_loop/GarbageCollectionController.zig
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/bun.js/event_loop/GarbageCollectionController.zig (1)
118-122:⚠️ Potential issue | 🟠 MajorKeep
gc_last_heap_sizeowned by the allocation-driven path.Line 122 overwrites the baseline that
processGCTimerWithHeapSize()uses for.run_on_next_tick,.pending, and.scheduled. That makes the> prev * 2escalation relative to the last repeating-timer sample instead of the last allocation-driven observation, so high-frequencyprocessGCTimer()callers can miss or delay fast-path GC.Suggested fix
const current = vm.blockBytesAllocated(); this.gc_last_heap_size_on_repeating_timer = current; - this.gc_last_heap_size = current;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/bun.js/event_loop/GarbageCollectionController.zig` around lines 118 - 122, The repeating-timer path is incorrectly clobbering the allocation-driven baseline by setting this.gc_last_heap_size = current; stop updating gc_last_heap_size inside the repeating-timer code (only update this.gc_last_heap_size_on_repeating_timer) so processGCTimerWithHeapSize()/processGCTimer() comparisons like the "> prev * 2" escalation (used for .run_on_next_tick, .pending, .scheduled) remain relative to the last allocation-driven observation; keep the repeating-timer only assigning this.gc_last_heap_size_on_repeating_timer and let the allocation-driven path continue to own updates to this.gc_last_heap_size.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 141-157: The equality check for stable heap size uses exact
equality (prev_heap_size == current) which misses cases where GC shrinks the
heap; change the condition in GarbageCollectionController.zig to treat
non-growth as stable by using a non-growth comparison (e.g., current <=
prev_heap_size or prev_heap_size >= current) for the block that increments
heap_size_didnt_change_for_repeating_timer_ticks_count and triggers
collectAsyncFull; keep the else branch calling updateGCRepeatTimer(.fast) when
the heap actually grows, and leave vm.collectAsync()/vm.collectAsyncFull() logic
unchanged so the idle Full-GC handoff is not starved by GC-induced decreases.
---
Duplicate comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 118-122: The repeating-timer path is incorrectly clobbering the
allocation-driven baseline by setting this.gc_last_heap_size = current; stop
updating gc_last_heap_size inside the repeating-timer code (only update
this.gc_last_heap_size_on_repeating_timer) so
processGCTimerWithHeapSize()/processGCTimer() comparisons like the "> prev * 2"
escalation (used for .run_on_next_tick, .pending, .scheduled) remain relative to
the last allocation-driven observation; keep the repeating-timer only assigning
this.gc_last_heap_size_on_repeating_timer and let the allocation-driven path
continue to own updates to this.gc_last_heap_size.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1f18c059-fc52-4639-ab2f-b84dfd26e73d
📒 Files selected for processing (1)
src/bun.js/event_loop/GarbageCollectionController.zig
…ransition collectAsync() with default scope lets JSC pick Eden vs Full, and at idle JSC keeps picking Eden: Heap::updateAllocationLimits ratchets m_maxHeapSize on every Eden (Heap.cpp:2534), so the 1/3 promotion ratio decays instead of crossing. 30 ticks of stable Eden GCs therefore never reclaim old-gen, and age-based CodeBlock jettison (which only runs during GC marking) never fires. Before switching to the 30s slow interval, fire an explicit Full; if it freed >1MB, fire one more (V8 MemoryReducer caps at 2 majors per idle activation, src/heap/memory-reducer.cc kRun→kDone). Then converge to slow. If allocation resumes mid-reduction, drop back to fast. New binding JSC__VM__collectAsyncFull → heap.collectAsync(CollectionScope::Full).
…ast-mode, reset count at converge - updateGCRepeatTimer(.fast) now unconditionally clears #idle_full_gcs_fired and the stable-tick counter (covers processGCTimerWithHeapSize observing activity while reduction is pending) - count>=30 → Full now gated on gc_repeating_timer_fast so reduction only enters at the fast→slow transition; in slow mode it just keeps doing Eden every 30s - converged branch resets the stable-tick counter explicitly so a no-op updateGCRepeatTimer(.slow) cannot leave it at 30+
processGCTimerWithHeapSize treated any heap-size change as activity and called updateGCRepeatTimer(.fast), which since 8820bc5 unconditionally resets #idle_full_gcs_fired. The async Full GC freeing memory therefore cancelled its own reduction mode before the next repeating tick could observe it, so the slow interval was never reached. Only heap growth is allocation activity; a decrease is the requested GC succeeding.
…as stable - updateGCRepeatTimer(.fast) resetting the stable-tick counter unconditionally meant any processGCTimer() heap-growth observation between repeating-timer ticks zeroed it, so apps with sub-second transient allocations (freed by Eden before the next 1s sample) could never accumulate 30 stable ticks. Revert to transition-only and reset state explicitly at the two onGCRepeatingTimer exit points that need it. - 'same or less' stability now actually checks <= instead of ==, so the Eden GC at the end of each tick freeing a few bytes no longer breaks the stable-tick streak. - gc_last_heap_size assignment in onGCRepeatingTimer is unchanged vs main (performGC() wrote it there too); not a regression.
|
@robobun adopt |
|
✅ Adopted — ready for review. Reproduced: with 40 MB of promoted-then-released arrays and an idle event loop, the repeating GC timer on Pushed on top of Jarred's commits:
All 12 review threads resolved; claude[bot]'s final pass found no remaining issues. Perf: release-build comparison vs CI: both complete runs after the main-merge (#52872: 283/286, #53505: 281/286) passed on all Linux (alpine/debian/ubuntu, incl. x64-asan) and all macOS targets incl. darwin-14-x64. |
onGCRepeatingTimer already has gc_last_heap_size_on_repeating_timer as its baseline; writing gc_last_heap_size here can raise the prev value that processGCTimerWithHeapSize compares against in .scheduled, suppressing the this_heap_size > prev * 2 emergency GC when the repeating timer happens to fire between the baseline capture and the growth check.
GarbageCollectionController.init() runs from ensureWaker() inside VirtualMachine.init(), before Run.boot/test_command call configureDefines() -> runEnvLoader() -> loadProcess(). At that point vm.transpiler.env is an empty DotEnv.Loader, so BUN_GC_TIMER_INTERVAL / BUN_GC_TIMER_DISABLE / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS were never seen. Read them via bun.getenvZ instead. The test drives the controller at a 20ms fast interval so the 30-stable-tick threshold is reached in ~600ms, then verifies ~40MB of promoted-then-released arrays are reclaimed at idle. Without collectAsyncFull(), JSC keeps picking Eden because the 1/3 promotion ratio never crosses once m_maxHeapSize has been set proportional to the live set, and the heap stays at ~40MB.
There was a problem hiding this comment.
All earlier review threads look resolved and I didn't find new issues in fcf2215, but this reworks the GC controller's fast/slow state machine and changes how BUN_GC_TIMER_* env vars are read — worth a human pass on the heuristics and the new test's stability before merging.
Extended reasoning...
Overview
This PR changes GarbageCollectionController.zig to fire up to 2 explicit collectAsync(CollectionScope::Full) calls at the fast→slow idle transition (modeled on V8's MemoryReducer), adds a new #idle_full_gcs_fired state field and reduction-mode branch in onGCRepeatingTimer, changes processGCTimerWithHeapSize to treat only heap growth (not any change) as activity, switches BUN_GC_TIMER_INTERVAL / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS / BUN_GC_TIMER_DISABLE reads from vm.transpiler.env to bun.getenvZ, adds a JSC__VM__collectAsyncFull binding (VM.zig / bindings.cpp / headers.h), and adds an integration test that spawns a child with BUN_GC_TIMER_INTERVAL=20 and asserts ~40 MB of promoted old-gen is reclaimed at idle.
Security risks
None identified. No user-controlled input reaches the new code paths; the new C++ binding is a thin wrapper around an existing JSC heap API and acquires the JSLock identically to the adjacent collectAsync binding. The env-var reads are debug/tuning knobs that already existed.
Level of scrutiny
High. This is the GC scheduling heuristic that runs in every Bun process. The PR went through five rounds of fixes for subtle state-machine interaction bugs (stale #idle_full_gcs_fired, counter-reset placement, gc_last_heap_size ownership, growth-vs-change semantics), which is a strong signal that the interactions between the repeating timer, the allocation-driven processGCTimer path, and the new reduction mode are non-obvious. The != prev → > prev change in .run_on_next_tick / .pending is a global behavior change to allocation-driven GC cadence, not just the idle path. The vm.transpiler.env → bun.getenvZ switch means these vars are now read only from the process environment, not from loaded .env files — the comment justifies this on init-ordering grounds, but it's a user-visible semantic change.
Other factors
All seven prior inline review comments (mine and CodeRabbit's) are resolved and the fixes look correct in the final diff. The new test is timing-based (2.5s sleep + polling at 100ms with a heapSize() < initial/4 threshold) — reasonable headroom is built in but it's the kind of test that can flake on heavily-loaded CI. Given the design-level choices here (2-Full-GC cap, 1 MB freed threshold, 30-tick stability window) and the breadth of runtime impact, a human should sign off rather than a bot.
…ronic test-http-should-emit-close Windows timeout)
With the repeating timer no longer writing gc_last_heap_size (d2332a4) and .pending ignoring decreases (6eb16c6), nothing lowered the allocation-driven baseline after an async GC freed memory. prev stayed pinned at the pre-shrink value, so the app could re-grow all the way back to it with no response from processGCTimerWithHeapSize, and the prev * 2 emergency threshold sat at the stale (much higher) baseline. Lower gc_last_heap_size when .pending observes a decrease, without rescheduling the timer or touching updateGCRepeatTimer (which would cancel idle reduction on a slow->fast transition).
…BUN_GC_TIMER_* in env_var updateGCRepeatTimer(.fast) only clears #idle_full_gcs_fired on a genuine slow->fast transition (af48886), so when processGCTimerWithHeapSize observed growth while already in fast mode, reduction mode survived into the next repeating-timer tick and could fire a spurious second Full GC (or briefly drop to .slow) during active allocation. Clear #idle_full_gcs_fired directly on growth without touching the stable-tick counter, so the over-eager-reset fix from af48886 stays intact. Also move BUN_GC_TIMER_INTERVAL / BUN_GC_TIMER_DISABLE / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS to bun.env_var — getenvZ is slated for sunset and this file already reads BUN_TRACK_LAST_FN_NAME via that path under the same pre-runEnvLoader timing constraint.
If processGCTimerWithHeapSize clears #idle_full_gcs_fired between tick 30 and tick 31 (allocation resumed), the next repeating-timer tick skips the reduction branch but the counter is still sitting at 30, so the >= 30 gate re-fires the first-Full-GC path every tick until heap convergence makes current > prev_heap_size. The < 2 cap lives only in the reduction branch, which this loop never visits. Reset the counter at the point we commit to the Full GC; reduction mode is driven by #idle_full_gcs_fired thereafter, and re-entry requires a fresh 30-tick window as intended.
There was a problem hiding this comment.
All review threads are resolved and I didn't find anything new, but this reworks the GC controller's fast/slow state machine and changes the allocation-driven != → > semantics in processGCTimerWithHeapSize — runs in every process and took several iterations to converge, so it's worth a human sign-off.
Extended reasoning...
Overview
This PR changes GarbageCollectionController to fire up to 2 explicit async Full GCs at the fast→slow transition (modeled on V8's MemoryReducer), so old-gen + JIT CodeBlock jettison actually run before the controller drops to the 30s idle interval. It touches:
src/jsc/GarbageCollectionController.zig— new#idle_full_gcs_firedstate, reduction-mode branch inonGCRepeatingTimer, and semantic changes toprocessGCTimerWithHeapSize(!=→>in.run_on_next_tick/.pending, plus a shrink-tracking branch in.pending).src/jsc/VM.zig,src/jsc/bindings/bindings.cpp,src/jsc/bindings/headers.h— newcollectAsyncFull()binding →heap.collectAsync(JSC::CollectionScope::Full).src/bun_core/env_var.zig+init()— registersBUN_GC_TIMER_INTERVAL/BUN_GC_TIMER_DISABLE/BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESSand reads them viabun.env_var(fixing a pre-existing bug where these were read fromvm.transpiler.envbefore it was populated, so they were silently ignored).test/js/bun/gc/gc-controller-idle-full.test.ts— drives the controller at a 20ms interval and verifies ~40 MB of promoted-then-released old-gen is reclaimed at idle.
Security risks
None. No untrusted input is parsed; the new env vars go through the existing bun.env_var typed accessors with safe fallbacks (std.math.cast → 1000 on overflow). The new C++ binding takes a JSLockHolder exactly like the adjacent collectAsync.
Level of scrutiny
High. The GC repeating timer runs in every Bun process and its cadence directly affects both idle CPU and steady-state RSS. The state machine here has five interacting variables (#idle_full_gcs_fired, the stable-tick counter, gc_repeating_timer_fast, and the two heap-size baselines) across two entry points (the 1s/30s repeating timer and the per-event-loop processGCTimer). Over the PR's lifetime, eight distinct edge-case bugs were found and fixed (perpetual Full GC every 60s, GC's own shrink cancelling reduction mode, stale baseline after shrink, allocation-resume not cancelling reduction while already in fast mode, the < 2 cap being bypassed, etc.) — each fix interacting with the previous one. The final state looks correct to me and the bug-hunting pass found nothing, but the number of iterations it took to converge is itself a signal that this deserves a maintainer's eyes.
Other factors
- All nine prior inline review threads (mine and CodeRabbit's) are resolved; the latest commit
65317cfaddresses my last comment about resetting the stable-tick counter when entering reduction mode. - The
!=→>change in.run_on_next_tick/.pendingis a behavioral change to the pre-existing allocation-driven GC path (heap shrinks no longer reschedule the 16ms timer), not just an additive idle feature — that's the part most worth a second opinion. - A regression test is included and per robobun passes on all CI platforms; the only red is the chronic Windows
test-http-should-emit-closeflake unrelated to this PR. - The env-var read fix is a nice drive-by but also means
BUN_GC_TIMER_DISABLEwill start actually working for users who set it.
On the darwin-14-x64 runner the 2.5s pure-idle window was tight enough that under load the repeating timer (nominally 20ms) didn't always fit 30 ticks before the poll phase began — test passed 1/4 there vs. clean on Linux/Windows/darwin-aarch64. Bump the idle window to 5s so even at ~5x effective timer slowdown there is room for the 30-stable-tick threshold.
There was a problem hiding this comment.
All prior review threads are resolved and the current pass found no new issues, but this rewrites the runtime-wide GC scheduling state machine and took ~8 review rounds to converge — worth a human pass over the final onGCRepeatingTimer / processGCTimerWithHeapSize interaction before merge.
Extended reasoning...
Overview
This PR changes GarbageCollectionController so that after 30 stable fast-mode ticks it fires up to 2 explicit collectAsync(CollectionScope::Full) calls (V8 MemoryReducer-style) before dropping to the 30s slow interval, instead of relying on JSC's Eden→Full promotion heuristic which never crosses at idle. It adds a new JSC__VM__collectAsyncFull binding (bindings.cpp / headers.h / VM.zig), reworks onGCRepeatingTimer and processGCTimerWithHeapSize around a new #idle_full_gcs_fired state, migrates the three BUN_GC_TIMER_* env vars from vm.transpiler.env (which was unpopulated at init() time, so they were silently ignored) to bun.env_var, and adds an integration test that drives the controller at a 20ms interval and asserts ~40 MB of promoted old-gen is reclaimed at idle.
Security risks
None. No untrusted input, no auth/crypto/permissions surface; the only external input is the three BUN_GC_* env vars, which now go through the typed env_var registry with std.math.cast bounds.
Level of scrutiny
High. GarbageCollectionController runs in every Bun process and its timer cadence directly shapes both idle CPU and steady-state RSS. The state machine here couples five pieces of state (#idle_full_gcs_fired, heap_size_didnt_change_for_repeating_timer_ticks_count, gc_repeating_timer_fast, gc_last_heap_size, gc_last_heap_size_on_repeating_timer) across two entry points (the 1s/30s repeating timer and the per-event-loop-tick processGCTimer), and the review history shows every adjustment to one invariant exposed a new edge in another (cancel-on-growth vs. starve-on-transient-allocation; baseline ownership vs. shrink-tracking; counter reset placement vs. the < 2 cap). The final code looks correct to me and every thread is resolved, but a heuristic this load-bearing deserves a human read of the converged version rather than a bot sign-off.
Other factors
- All 9 inline review threads (mine + CodeRabbit) are marked resolved, and I verified the final diff incorporates each fix — including the most recent one (resetting the stable-tick counter at the first-Full-GC entry so the allocation-path
#idle_full_gcs_fired = 0clear can't re-trigger the branch and bypass the< 2cap). - robobun reproduced the underlying issue on
mainand reports the new test passes on all CI platforms; remaining CI red is the known unrelated Windowstest-http-should-emit-closeflake. - The env-var migration is a real behavioral fix (these vars were previously dead) and follows the established
env_var.zigpattern. - No CODEOWNERS cover these paths.
…red; 219 test jobs never ran)
There was a problem hiding this comment.
No new issues found — all prior threads are resolved and the state machine now looks consistent. Deferring to a human for the final sign-off since this rewrites the core GC-timer scheduling that runs in every Bun process, and the eight rounds of fixes here show how easy it is to get the two-path interaction subtly wrong.
Extended reasoning...
Overview
This PR changes GarbageCollectionController to fire up to two explicit collectAsync(CollectionScope::Full) calls at the fast→slow idle transition (modeled on V8's MemoryReducer), so old-gen garbage and age-eligible JIT CodeBlocks are reclaimed before the controller drops to the 30s slow interval. It also: adds a new JSC__VM__collectAsyncFull binding (bindings.cpp / headers.h / VM.zig); decouples the repeating-timer heap baseline from the allocation-driven baseline; changes processGCTimerWithHeapSize to treat only heap growth as activity and to track GC-driven shrinks in .pending; migrates BUN_GC_TIMER_INTERVAL / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS / BUN_GC_TIMER_DISABLE from vm.transpiler.env (which was unpopulated at init() time) to bun.env_var; and adds an integration test that drives the controller at a 20 ms interval and asserts the ~40 MB promoted heap drops below 25%.
Security risks
None identified. No untrusted input parsing, no auth/crypto/permissions surface. The new C++ binding is a thin wrapper around an existing JSC heap API and takes no user-controlled data. The env-var reads go through the existing typed bun.env_var accessors.
Level of scrutiny
High. This is the GC scheduling state machine that runs on a repeating timer in every Bun process and interacts with the allocation-driven processGCTimer path through several shared state variables (gc_last_heap_size, gc_last_heap_size_on_repeating_timer, heap_size_didnt_change_for_repeating_timer_ticks_count, #idle_full_gcs_fired, gc_repeating_timer_fast). Errors here manifest as either retained memory in long-running servers or wasted CPU on spurious Full GCs — neither shows up as a test failure. The review history bears this out: eight successive state-machine issues were found and fixed across commits 8820bc5 → 6eb16c6 → 09bd0d3 → af48886 → d2332a4 → a52ca18 → c38b227 → 65317cf, several of which were regressions of earlier fixes. The current revision looks internally consistent to me and the bug hunter found nothing new, but that iteration history is exactly why a human should do the final read-through.
Other factors
- The env-var migration is a real behavior change: the old
vm.transpiler.env.get(...)reads ran beforerunEnvLoader()and were effectively dead, soBUN_GC_TIMER_INTERVALetc. now actually take effect for the first time. Worth a conscious sign-off (andBUN_GC_TIMER_DISABLEis now truthy-checked rather than presence-checked). - The new test covers the headline behavior (idle Full GC fires and reclaims old-gen) but not the harder-to-test edge cases that drove most of the review churn (allocation resuming mid-reduction, fast→fast no-op transitions, baseline contamination).
- CI failures are the known Windows
fetch()-abort flake and unrelated S3/install/hot tests, per robobun's note.
Per review: the 512 was an arbitrary round number chosen to cross JSC's 256-byte minExtraMemory fast-path threshold (so reportExtraMemoryAllocated would drive didAllocate in release). Since this PR is scoped to the accounting/hygiene side — #29280 is the real fix for GC pacing — the floor is unnecessary inflation. visitor.reportExtraMemoryVisited in visitChildren is the hot path for this PR's goal (accurate extraMemorySize); it has no size threshold, so every byte reported counts regardless of build mode. Releasing @sizeof(Self) straight through is the honest number: ~464 bytes in debug/ASAN, ~96 in release. Lower the test threshold accordingly: 100,000 stays comfortably above the ~20 KB noise floor we measured without the fix and under the ~190 KB release-mode signal.
There was a problem hiding this comment.
No remaining issues from my side — all prior threads are resolved and the final state machine looks coherent — but since this rewrites the runtime GC scheduling that every Bun process depends on (and took several rounds to converge), it's worth a human sign-off before merge.
Extended reasoning...
Overview
This PR reworks GarbageCollectionController.onGCRepeatingTimer to fire up to two explicit collectAsync(CollectionScope::Full) calls at the fast→slow transition (modeled on V8's MemoryReducer), so old-gen garbage and age-eligible CodeBlocks are reclaimed before the controller drops to the 30 s interval. It adds a new #idle_full_gcs_fired state field, a JSC__VM__collectAsyncFull binding (Zig + C++ + headers), changes processGCTimerWithHeapSize to treat only heap growth as activity (and to lower gc_last_heap_size on GC-driven shrinks), migrates the three BUN_GC_TIMER_* env vars from vm.transpiler.env to bun.env_var (fixing them being read before runEnvLoader() populated the transpiler env), and adds an integration test that drives the controller at a 20 ms interval and asserts ~40 MB of promoted-then-released arrays is reclaimed at idle.
Security risks
None. No untrusted input handling, auth, crypto, or privilege boundaries are touched. The new C++ binding is a thin wrapper around an existing JSC heap API and takes the JSLock exactly like the adjacent collectAsync binding. The env-var reads use the typed bun.env_var accessors and only affect GC timer cadence.
Level of scrutiny
High. This is the per-VM GC scheduling state machine — it runs in every Bun process and governs memory pressure vs. CPU overhead at idle. The change is not mechanical: the interaction between #idle_full_gcs_fired, heap_size_didnt_change_for_repeating_timer_ticks_count, gc_repeating_timer_fast, and the two heap baselines is subtle, and the PR went through roughly eight rounds of review-driven fixes (counter not reset on slow→slow no-op, allocation path not cancelling reduction, GC-driven shrink mistaken for activity, repeating-timer baseline contaminating the allocation-driven baseline, .pending not lowering its baseline after shrink, fast→fast no-op clearing the stable-tick counter, and the < 2 cap being bypassable). All of those threads are now resolved and reflected in the final diff, the new test passes on all platforms in CI, and the current bug-hunting pass found nothing — but the iteration history itself is evidence that this state machine is easy to get subtly wrong.
Other factors
The change also alters pre-existing behavior in processGCTimerWithHeapSize (the != → > semantics in .run_on_next_tick / .pending), which slightly changes allocation-driven GC cadence independent of the new idle-Full-GC feature. CI is green on Linux/macOS with only known-flaky Windows failures unrelated to this diff. Given the runtime-wide blast radius and the design judgment involved (e.g., the 30-tick / 1 MB / max-2 thresholds), a human reviewer should sign off rather than auto-approving.
|
Is there a bench showing no CPU / perf regression? |
|
Release builds of 1. HTTP throughput with per-request allocation —
Within the ~6 % run-to-run noise of either binary alone. 2. Tight allocation loop yielding to the event loop — 200k object allocs,
No measurable difference. 3. Idle CPU — allocate 40 MB,
The ~7 ms delta is the 1–2 async Full GCs at the transition — one-shot per idle activation, ≈0.016 % of the idle wall time. RSS drifts up on main and holds/drops on the PR, which is the point. scriptsServer: const server = Bun.serve({
port: 0,
fetch(req) {
const url = new URL(req.url);
const data = { path: url.pathname, ts: Date.now(),
items: Array.from({ length: 20 }, (_, i) => ({ id: i, v: Math.random() })) };
return Response.json(data);
},
});
console.log(server.port);Alloc loop: const N = 200_000; let sink = 0;
const t0 = performance.now();
for (let i = 0; i < N; i++) {
const o = { a: i, b: [i, i + 1, i + 2], c: "x".repeat(16) };
sink += o.a;
if ((i & 63) === 0) await Promise.resolve();
}
console.log({ ms: (performance.now() - t0).toFixed(1), sink });Idle: import { readFileSync } from "node:fs";
const cpuMs = () => { const f = readFileSync("/proc/self/stat","utf8").split(" ");
return ((+f[13]+ +f[14])*1000)/100; };
let data = []; for (let i=0;i<5000;i++) data.push(new Array(1000).fill(i));
Bun.gc(true); Bun.gc(true); data = null;
await Bun.sleep(100);
const b = cpuMs(), r0 = process.memoryUsage.rss();
await Bun.sleep(45_000);
console.log({ idle_cpu_ms: Math.round(cpuMs()-b),
rss_before_mb:(r0/1048576).toFixed(1), rss_after_mb:(process.memoryUsage.rss()/1048576).toFixed(1) }); |
|
Superseded by #30725 (Rust port on current main). |
…ow interval Rust port of #29280. The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle JSC keeps picking Eden because Heap::updateAllocationLimits ratchets m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires. At the fast->slow transition, fire one explicit collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent - same convergence rule as V8's MemoryReducer. If allocation resumes mid-reduction, drop back to fast. No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active->idle transition. Also routes the BUN_GC_TIMER_* env reads through bun_core::env_var (process-env backed) instead of vm.transpiler.env, which is still null when init() runs from ensure_waker(). New binding: JSC__VM__collectAsyncFull -> heap.collectAsync(CollectionScope::Full).
…ow interval Rust port of #29280. The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle JSC keeps picking Eden because Heap::updateAllocationLimits ratchets m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires. At the fast->slow transition, fire one explicit collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent - same convergence rule as V8's MemoryReducer. If allocation resumes mid-reduction, drop back to fast. No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active->idle transition. Also routes the BUN_GC_TIMER_* env reads through bun_core::env_var (process-env backed) instead of vm.transpiler.env, which is still null when init() runs from ensure_waker(). New binding: JSC__VM__collectAsyncFull -> heap.collectAsync(CollectionScope::Full).
The repeating timer's `collectAsync()` lets JSC pick Eden vs Full. At idle JSC keeps picking Eden because `Heap::updateAllocationLimits` ratchets `m_maxHeapSize` on every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires.
Benchmark on Claude Code: `fullGC()` at steady state reclaims ~103 MB; `gcAndSweep` ~141 MB — that is the dead old-gen + jettison-eligible JIT code this timer was leaving on the table.
Change: at the fast→slow transition, fire one explicit `collectAsync(CollectionScope::Full)`. If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent — same convergence rule as V8's `MemoryReducer` (`kRun→kDone` when `started_gcs >= 2 || freed <= 1MB`). If allocation resumes mid-reduction, drop back to fast.
No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active→idle transition.
New binding: `JSC__VM__collectAsyncFull` → `heap.collectAsync(JSC::CollectionScope::Full)`.