Skip to content

GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#29280

Closed
Jarred-Sumner wants to merge 15 commits into
mainfrom
jarred/idle-full-gc
Closed

GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#29280
Jarred-Sumner wants to merge 15 commits into
mainfrom
jarred/idle-full-gc

Conversation

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

The repeating timer's `collectAsync()` lets JSC pick Eden vs Full. At idle JSC keeps picking Eden because `Heap::updateAllocationLimits` ratchets `m_maxHeapSize` on every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires.

Benchmark on Claude Code: `fullGC()` at steady state reclaims ~103 MB; `gcAndSweep` ~141 MB — that is the dead old-gen + jettison-eligible JIT code this timer was leaving on the table.

Change: at the fast→slow transition, fire one explicit `collectAsync(CollectionScope::Full)`. If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent — same convergence rule as V8's `MemoryReducer` (`kRun→kDone` when `started_gcs >= 2 || freed <= 1MB`). If allocation resumes mid-reduction, drop back to fast.

No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active→idle transition.

New binding: `JSC__VM__collectAsyncFull` → `heap.collectAsync(JSC::CollectionScope::Full)`.

@robobun

robobun commented Apr 13, 2026

Copy link
Copy Markdown
Collaborator
Updated 2:18 PM PT - May 12th, 2026

@alii, your commit 3fb4b84 has 1 failures in Build #53818 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29280

That installs a local version of the PR into your bun-29280 executable, so you can run:

bun-29280 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. Memory (RSS) in Bun Spawned Child Process Grows Slowly, Even When Idle #21560 - RSS grows slowly even when idle, heap stays flat, GC has no effect — exactly the pattern of Eden-only GC never triggering Full GC at idle
  2. Memory Not Freed After Running bun --hot Command #14734 - Memory not freed between hot-reload cycles; old-gen objects from prior reloads accumulate because only Eden GC runs during idle periods between saves

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #21560
Fixes #14734

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new VM API/FFI collectAsyncFull() and its C binding; updates GarbageCollectionController to sample heap size, track idle full-GC escalation, change repeating-timer logic, and call collectAsync() or collectAsyncFull() based on heap trends and idle-stability ticks.

Changes

Cohort / File(s) Summary
VM Zig wrapper & headers
src/bun.js/bindings/VM.zig, src/bun.js/bindings/headers.h
Added pub fn collectAsyncFull(vm: *VM) void and the C/FFI declaration void JSC__VM__collectAsyncFull(JSC::VM* arg0);.
C++ bindings implementation
src/bun.js/bindings/bindings.cpp
Added void JSC__VM__collectAsyncFull(JSC::VM* vm) that acquires the VM lock and calls vm->heap.collectAsync(JSC::CollectionScope::Full).
GC controller logic
src/bun.js/event_loop/GarbageCollectionController.zig
Added idle_full_gcs_fired state; reset on fast-mode entry; early return when disabled; replaced performGC() sampling with vm.blockBytesAllocated() updates; introduced reduction-mode branch that triggers collectAsync() or collectAsyncFull() based on heap growth/decay and idle ticks; treat only heap increases as activity; ensure vm.collectAsync() is invoked on the normal repeating-timer path.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing Full GC invocation (up to 2) during the fast→slow transition in GarbageCollectionController, which is central to all file changes across the PR.
Description check ✅ Passed The PR description comprehensively covers what the PR does, provides benchmark evidence, explains the change rationale, and describes convergence behavior. However, the 'How did you verify your code works?' template section is missing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bun.js/event_loop/GarbageCollectionController.zig (1)

142-155: ⚠️ Potential issue | 🟠 Major

Keep the 30-tick Full-GC trigger disabled once the controller is already on the slow interval.

After updateGCRepeatTimer(.slow) clears the stability counter, this branch starts counting again on the 30s cadence. Thirty slow ticks later, Line 150 fires another collectAsyncFull(), so one idle stretch can keep re-entering the reducer every ~15 minutes instead of stopping after the intended 1–2 convergence GCs. Gate this threshold on gc_repeating_timer_fast so it only runs during the fast→slow transition.

Suggested guard
-    if (prev_heap_size == current) {
-        this.heap_size_didnt_change_for_repeating_timer_ticks_count +|= 1;
-        if (this.heap_size_didnt_change_for_repeating_timer_ticks_count >= 30) {
-            // 30s of stable Eden GCs. collectAsync() never escalates to Full
-            // here because Heap::updateAllocationLimits ratchets m_maxHeapSize
-            // on every Eden, so the 1/3 promotion ratio decays instead of
-            // crossing. Fire an explicit Full so old-gen + age-based CodeBlock
-            // jettison run before we go to the 30s interval.
-            this.idle_full_gcs_fired = 1;
-            vm.collectAsyncFull();
-        }
-    } else {
+    if (this.gc_repeating_timer_fast and prev_heap_size == current) {
+        this.heap_size_didnt_change_for_repeating_timer_ticks_count +|= 1;
+        if (this.heap_size_didnt_change_for_repeating_timer_ticks_count >= 30) {
+            // 30s of stable Eden GCs. collectAsync() never escalates to Full
+            // here because Heap::updateAllocationLimits ratchets m_maxHeapSize
+            // on every Eden, so the 1/3 promotion ratio decays instead of
+            // crossing. Fire an explicit Full so old-gen + age-based CodeBlock
+            // jettison run before we go to the 30s interval.
+            this.idle_full_gcs_fired = 1;
+            vm.collectAsyncFull();
+        }
+    } else if (prev_heap_size != current) {
         this.heap_size_didnt_change_for_repeating_timer_ticks_count = 0;
         this.updateGCRepeatTimer(.fast);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/event_loop/GarbageCollectionController.zig` around lines 142 -
155, The 30-tick idle Full-GC should only run while the controller is on the
fast repeating timer; modify the prev_heap_size == current branch in
GarbageCollectionController to only increment/check
heap_size_didnt_change_for_repeating_timer_ticks_count and trigger
vm.collectAsyncFull() when gc_repeating_timer_fast is true (i.e., gate the
entire 30-tick/idle_full_gcs_fired logic on this.gc_repeating_timer_fast).
Ensure updateGCRepeatTimer(.slow) still resets
heap_size_didnt_change_for_repeating_timer_ticks_count as it does now so
counting stops once we switch to slow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 142-155: The 30-tick idle Full-GC should only run while the
controller is on the fast repeating timer; modify the prev_heap_size == current
branch in GarbageCollectionController to only increment/check
heap_size_didnt_change_for_repeating_timer_ticks_count and trigger
vm.collectAsyncFull() when gc_repeating_timer_fast is true (i.e., gate the
entire 30-tick/idle_full_gcs_fired logic on this.gc_repeating_timer_fast).
Ensure updateGCRepeatTimer(.slow) still resets
heap_size_didnt_change_for_repeating_timer_ticks_count as it does now so
counting stops once we switch to slow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4144b5ae-aef1-4bdd-b25c-00741a9de1eb

📥 Commits

Reviewing files that changed from the base of the PR and between ccbaed9 and efe9f1b3925dc2983078f777cca8d4800d4107ed.

📒 Files selected for processing (4)
  • src/bun.js/bindings/VM.zig
  • src/bun.js/bindings/bindings.cpp
  • src/bun.js/bindings/headers.h
  • src/bun.js/event_loop/GarbageCollectionController.zig

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Reduce idle CPU wakeups from GC repeating timer #28696 - Also modifies GarbageCollectionController idle timer logic to reduce unnecessary GC wakeups by skipping collectAsync when the heap hasn't grown and lowering the slow-mode transition threshold
  2. fix(gc): reduce idle CPU by transitioning GC timer to slow mode faster #27366 - Also modifies GarbageCollectionController to reduce idle CPU by transitioning the GC timer to slow mode faster (threshold from 30 to 5 ticks)

🤖 Generated with Claude Code

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Line 27: The field idle_full_gcs_fired in the GarbageCollectionController
struct is internal mutable state and must be made private by renaming it with
the Zig private-field prefix; change the declaration to use `#idle_full_gcs_fired`
and update all internal references inside GarbageCollectionController (any
reads/writes that currently use idle_full_gcs_fired) to use `#idle_full_gcs_fired`
so the field follows the private-field convention.
- Around line 122-157: Reset the idle-reduction state as soon as any non-idle
path observes activity: clear idle_full_gcs_fired and
heap_size_didnt_change_for_repeating_timer_ticks_count (e.g., set both to 0)
whenever you switch the controller back to .fast, schedule a GC, or call
performGC() from outside the idle-reducer path (places that call
updateGCRepeatTimer(.fast), processGCTimerWithHeapSize(), or performGC()). To
make this robust, add a small helper (e.g., cancelIdleReduction()) that zeroes
idle_full_gcs_fired and heap_size_didnt_change_for_repeating_timer_ticks_count
and invoke it from those branches so resumed allocation won't be handled by the
idle-reduction logic on the next tick.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a9399947-784d-4f27-85c8-7f8ae01d9f95

📥 Commits

Reviewing files that changed from the base of the PR and between efe9f1b3925dc2983078f777cca8d4800d4107ed and 8ba592c.

📒 Files selected for processing (4)
  • src/bun.js/bindings/VM.zig
  • src/bun.js/bindings/bindings.cpp
  • src/bun.js/bindings/headers.h
  • src/bun.js/event_loop/GarbageCollectionController.zig

Comment thread src/bun.js/event_loop/GarbageCollectionController.zig Outdated
Comment thread src/jsc/GarbageCollectionController.zig
Comment thread src/jsc/GarbageCollectionController.zig
Comment thread src/jsc/GarbageCollectionController.zig

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 118-122: In the repeating-timer path (the code using
bunVM().jsc_vm.blockBytesAllocated()), do not update gc_last_heap_size here—only
update gc_last_heap_size_on_repeating_timer and use prev_heap_size from
gc_last_heap_size_on_repeating_timer for comparisons; remove or revert the
assignment to gc_last_heap_size so processGCTimerWithHeapSize() (and its
.run_on_next_tick/.pending/.scheduled fast-path check this_heap_size > prev * 2)
keeps ownership of gc_last_heap_size for the allocation-driven path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eaefa315-359d-4f8f-9437-a317549987ee

📥 Commits

Reviewing files that changed from the base of the PR and between 8ba592c and 8820bc5.

📒 Files selected for processing (1)
  • src/bun.js/event_loop/GarbageCollectionController.zig

Comment thread src/jsc/GarbageCollectionController.zig Outdated
Comment thread src/bun.js/event_loop/GarbageCollectionController.zig

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/bun.js/event_loop/GarbageCollectionController.zig (1)

118-122: ⚠️ Potential issue | 🟠 Major

Keep gc_last_heap_size owned by the allocation-driven path.

Line 122 overwrites the baseline that processGCTimerWithHeapSize() uses for .run_on_next_tick, .pending, and .scheduled. That makes the > prev * 2 escalation relative to the last repeating-timer sample instead of the last allocation-driven observation, so high-frequency processGCTimer() callers can miss or delay fast-path GC.

Suggested fix
     const current = vm.blockBytesAllocated();
     this.gc_last_heap_size_on_repeating_timer = current;
-    this.gc_last_heap_size = current;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/event_loop/GarbageCollectionController.zig` around lines 118 -
122, The repeating-timer path is incorrectly clobbering the allocation-driven
baseline by setting this.gc_last_heap_size = current; stop updating
gc_last_heap_size inside the repeating-timer code (only update
this.gc_last_heap_size_on_repeating_timer) so
processGCTimerWithHeapSize()/processGCTimer() comparisons like the "> prev * 2"
escalation (used for .run_on_next_tick, .pending, .scheduled) remain relative to
the last allocation-driven observation; keep the repeating-timer only assigning
this.gc_last_heap_size_on_repeating_timer and let the allocation-driven path
continue to own updates to this.gc_last_heap_size.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 141-157: The equality check for stable heap size uses exact
equality (prev_heap_size == current) which misses cases where GC shrinks the
heap; change the condition in GarbageCollectionController.zig to treat
non-growth as stable by using a non-growth comparison (e.g., current <=
prev_heap_size or prev_heap_size >= current) for the block that increments
heap_size_didnt_change_for_repeating_timer_ticks_count and triggers
collectAsyncFull; keep the else branch calling updateGCRepeatTimer(.fast) when
the heap actually grows, and leave vm.collectAsync()/vm.collectAsyncFull() logic
unchanged so the idle Full-GC handoff is not starved by GC-induced decreases.

---

Duplicate comments:
In `@src/bun.js/event_loop/GarbageCollectionController.zig`:
- Around line 118-122: The repeating-timer path is incorrectly clobbering the
allocation-driven baseline by setting this.gc_last_heap_size = current; stop
updating gc_last_heap_size inside the repeating-timer code (only update
this.gc_last_heap_size_on_repeating_timer) so
processGCTimerWithHeapSize()/processGCTimer() comparisons like the "> prev * 2"
escalation (used for .run_on_next_tick, .pending, .scheduled) remain relative to
the last allocation-driven observation; keep the repeating-timer only assigning
this.gc_last_heap_size_on_repeating_timer and let the allocation-driven path
continue to own updates to this.gc_last_heap_size.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f18c059-fc52-4639-ab2f-b84dfd26e73d

📥 Commits

Reviewing files that changed from the base of the PR and between 8820bc5 and 24068ef.

📒 Files selected for processing (1)
  • src/bun.js/event_loop/GarbageCollectionController.zig

Comment thread src/bun.js/event_loop/GarbageCollectionController.zig Outdated
Comment thread src/bun.js/event_loop/GarbageCollectionController.zig
Comment thread src/jsc/GarbageCollectionController.zig Outdated
…ransition

collectAsync() with default scope lets JSC pick Eden vs Full, and at idle JSC
keeps picking Eden: Heap::updateAllocationLimits ratchets m_maxHeapSize on every
Eden (Heap.cpp:2534), so the 1/3 promotion ratio decays instead of crossing.
30 ticks of stable Eden GCs therefore never reclaim old-gen, and age-based
CodeBlock jettison (which only runs during GC marking) never fires.

Before switching to the 30s slow interval, fire an explicit Full; if it freed
>1MB, fire one more (V8 MemoryReducer caps at 2 majors per idle activation,
src/heap/memory-reducer.cc kRun→kDone). Then converge to slow. If allocation
resumes mid-reduction, drop back to fast.

New binding JSC__VM__collectAsyncFull → heap.collectAsync(CollectionScope::Full).
…ast-mode, reset count at converge

- updateGCRepeatTimer(.fast) now unconditionally clears #idle_full_gcs_fired and
  the stable-tick counter (covers processGCTimerWithHeapSize observing activity
  while reduction is pending)
- count>=30 → Full now gated on gc_repeating_timer_fast so reduction only enters
  at the fast→slow transition; in slow mode it just keeps doing Eden every 30s
- converged branch resets the stable-tick counter explicitly so a no-op
  updateGCRepeatTimer(.slow) cannot leave it at 30+
processGCTimerWithHeapSize treated any heap-size change as activity and
called updateGCRepeatTimer(.fast), which since 8820bc5 unconditionally
resets #idle_full_gcs_fired. The async Full GC freeing memory therefore
cancelled its own reduction mode before the next repeating tick could
observe it, so the slow interval was never reached. Only heap growth is
allocation activity; a decrease is the requested GC succeeding.
…as stable

- updateGCRepeatTimer(.fast) resetting the stable-tick counter
  unconditionally meant any processGCTimer() heap-growth observation
  between repeating-timer ticks zeroed it, so apps with sub-second
  transient allocations (freed by Eden before the next 1s sample) could
  never accumulate 30 stable ticks. Revert to transition-only and reset
  state explicitly at the two onGCRepeatingTimer exit points that need it.
- 'same or less' stability now actually checks <= instead of ==, so the
  Eden GC at the end of each tick freeing a few bytes no longer breaks
  the stable-tick streak.
- gc_last_heap_size assignment in onGCRepeatingTimer is unchanged vs
  main (performGC() wrote it there too); not a regression.
@alii alii force-pushed the jarred/idle-full-gc branch from 673865f to af48886 Compare May 5, 2026 02:16
@alii

alii commented May 5, 2026

Copy link
Copy Markdown
Member

@robobun adopt

@robobun

robobun commented May 5, 2026

Copy link
Copy Markdown
Collaborator

✅ Adopted — ready for review.

Reproduced: with 40 MB of promoted-then-released arrays and an idle event loop, the repeating GC timer on main fires only EdenCollection (JSC's shouldDoFullCollection() ratio never crosses once m_maxHeapSize has been set proportional to the live set), so the heap stays at ~40 MB indefinitely. With this PR it drops to ~1 MB after the 30-tick idle Full GC.

Pushed on top of Jarred's commits:

  • d2332a4 — drop the gc_last_heap_size = current write in onGCRepeatingTimer; processGCTimerWithHeapSize exclusively owns the allocation-driven baseline.
  • a52ca18.pending lowers gc_last_heap_size on GC-driven shrink so re-growth is detected against the post-GC value.
  • c38b227processGCTimerWithHeapSize clears #idle_full_gcs_fired directly on growth so allocation resuming between tick 30 and tick 31 cancels reduction mode even while updateGCRepeatTimer(.fast) is a fast→fast no-op; BUN_GC_TIMER_* registered in bun.env_var (they were previously read from vm.transpiler.env before runEnvLoader() populated it, so they were silently ignored).
  • 65317cf — reset the stable-tick counter when committing to the first Full GC so the < 2 cap can't be bypassed.
  • fcf2215 / 43b43eftest/js/bun/gc/gc-controller-idle-full.test.ts: drives the controller at a 20 ms fast interval; fails on main, passes on this branch.

All 12 review threads resolved; claude[bot]'s final pass found no remaining issues.

Perf: release-build comparison vs main@314ffe307 shows no regression in HTTP throughput (30.3k vs 30.1k req/s, within noise) or the per-event-loop-tick allocation path (23.89 vs 23.88 ms); idle CPU over a 45 s window is +~7 ms for the 1–2 async Full GCs at the fast→slow transition, with RSS holding/dropping instead of drifting up — details.

CI: both complete runs after the main-merge (#52872: 283/286, #53505: 281/286) passed on all Linux (alpine/debian/ubuntu, incl. x64-asan) and all macOS targets incl. darwin-14-x64. gc-controller-idle-full.test.ts is green on every platform in both runs. The only red is unrelated Windows flakes (test-http-should-emit-close-when-connection-is-aborted.ts — chronic, #30217 workaround was declined; hot.test.ts — also flaking on other PRs) plus an S3 UnknownError network flake on darwin-14-aarch64. The diff itself is green.

robobun added 2 commits May 5, 2026 05:23
onGCRepeatingTimer already has gc_last_heap_size_on_repeating_timer as its
baseline; writing gc_last_heap_size here can raise the prev value that
processGCTimerWithHeapSize compares against in .scheduled, suppressing the
this_heap_size > prev * 2 emergency GC when the repeating timer happens to
fire between the baseline capture and the growth check.
GarbageCollectionController.init() runs from ensureWaker() inside
VirtualMachine.init(), before Run.boot/test_command call configureDefines()
-> runEnvLoader() -> loadProcess(). At that point vm.transpiler.env is an
empty DotEnv.Loader, so BUN_GC_TIMER_INTERVAL / BUN_GC_TIMER_DISABLE /
BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS were never seen. Read them via
bun.getenvZ instead.

The test drives the controller at a 20ms fast interval so the 30-stable-tick
threshold is reached in ~600ms, then verifies ~40MB of promoted-then-released
arrays are reclaimed at idle. Without collectAsyncFull(), JSC keeps picking
Eden because the 1/3 promotion ratio never crosses once m_maxHeapSize has been
set proportional to the live set, and the heap stays at ~40MB.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All earlier review threads look resolved and I didn't find new issues in fcf2215, but this reworks the GC controller's fast/slow state machine and changes how BUN_GC_TIMER_* env vars are read — worth a human pass on the heuristics and the new test's stability before merging.

Extended reasoning...

Overview

This PR changes GarbageCollectionController.zig to fire up to 2 explicit collectAsync(CollectionScope::Full) calls at the fast→slow idle transition (modeled on V8's MemoryReducer), adds a new #idle_full_gcs_fired state field and reduction-mode branch in onGCRepeatingTimer, changes processGCTimerWithHeapSize to treat only heap growth (not any change) as activity, switches BUN_GC_TIMER_INTERVAL / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS / BUN_GC_TIMER_DISABLE reads from vm.transpiler.env to bun.getenvZ, adds a JSC__VM__collectAsyncFull binding (VM.zig / bindings.cpp / headers.h), and adds an integration test that spawns a child with BUN_GC_TIMER_INTERVAL=20 and asserts ~40 MB of promoted old-gen is reclaimed at idle.

Security risks

None identified. No user-controlled input reaches the new code paths; the new C++ binding is a thin wrapper around an existing JSC heap API and acquires the JSLock identically to the adjacent collectAsync binding. The env-var reads are debug/tuning knobs that already existed.

Level of scrutiny

High. This is the GC scheduling heuristic that runs in every Bun process. The PR went through five rounds of fixes for subtle state-machine interaction bugs (stale #idle_full_gcs_fired, counter-reset placement, gc_last_heap_size ownership, growth-vs-change semantics), which is a strong signal that the interactions between the repeating timer, the allocation-driven processGCTimer path, and the new reduction mode are non-obvious. The != prev> prev change in .run_on_next_tick / .pending is a global behavior change to allocation-driven GC cadence, not just the idle path. The vm.transpiler.envbun.getenvZ switch means these vars are now read only from the process environment, not from loaded .env files — the comment justifies this on init-ordering grounds, but it's a user-visible semantic change.

Other factors

All seven prior inline review comments (mine and CodeRabbit's) are resolved and the fixes look correct in the final diff. The new test is timing-based (2.5s sleep + polling at 100ms with a heapSize() < initial/4 threshold) — reasonable headroom is built in but it's the kind of test that can flake on heavily-loaded CI. Given the design-level choices here (2-Full-GC cap, 1 MB freed threshold, 30-tick stability window) and the breadth of runtime impact, a human should sign off rather than a bot.

…ronic test-http-should-emit-close Windows timeout)
Comment thread src/jsc/GarbageCollectionController.zig
With the repeating timer no longer writing gc_last_heap_size (d2332a4) and
.pending ignoring decreases (6eb16c6), nothing lowered the allocation-driven
baseline after an async GC freed memory. prev stayed pinned at the pre-shrink
value, so the app could re-grow all the way back to it with no response from
processGCTimerWithHeapSize, and the prev * 2 emergency threshold sat at the
stale (much higher) baseline.

Lower gc_last_heap_size when .pending observes a decrease, without
rescheduling the timer or touching updateGCRepeatTimer (which would cancel
idle reduction on a slow->fast transition).
Comment thread src/jsc/GarbageCollectionController.zig
Comment thread src/jsc/GarbageCollectionController.zig Outdated
…BUN_GC_TIMER_* in env_var

updateGCRepeatTimer(.fast) only clears #idle_full_gcs_fired on a genuine
slow->fast transition (af48886), so when processGCTimerWithHeapSize observed
growth while already in fast mode, reduction mode survived into the next
repeating-timer tick and could fire a spurious second Full GC (or briefly
drop to .slow) during active allocation. Clear #idle_full_gcs_fired directly
on growth without touching the stable-tick counter, so the over-eager-reset
fix from af48886 stays intact.

Also move BUN_GC_TIMER_INTERVAL / BUN_GC_TIMER_DISABLE /
BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS to bun.env_var — getenvZ is slated for
sunset and this file already reads BUN_TRACK_LAST_FN_NAME via that path under
the same pre-runEnvLoader timing constraint.
Comment thread src/jsc/GarbageCollectionController.zig
If processGCTimerWithHeapSize clears #idle_full_gcs_fired between tick 30
and tick 31 (allocation resumed), the next repeating-timer tick skips the
reduction branch but the counter is still sitting at 30, so the >= 30 gate
re-fires the first-Full-GC path every tick until heap convergence makes
current > prev_heap_size. The < 2 cap lives only in the reduction branch,
which this loop never visits.

Reset the counter at the point we commit to the Full GC; reduction mode is
driven by #idle_full_gcs_fired thereafter, and re-entry requires a fresh
30-tick window as intended.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review threads are resolved and I didn't find anything new, but this reworks the GC controller's fast/slow state machine and changes the allocation-driven !=> semantics in processGCTimerWithHeapSize — runs in every process and took several iterations to converge, so it's worth a human sign-off.

Extended reasoning...

Overview

This PR changes GarbageCollectionController to fire up to 2 explicit async Full GCs at the fast→slow transition (modeled on V8's MemoryReducer), so old-gen + JIT CodeBlock jettison actually run before the controller drops to the 30s idle interval. It touches:

  • src/jsc/GarbageCollectionController.zig — new #idle_full_gcs_fired state, reduction-mode branch in onGCRepeatingTimer, and semantic changes to processGCTimerWithHeapSize (!=> in .run_on_next_tick/.pending, plus a shrink-tracking branch in .pending).
  • src/jsc/VM.zig, src/jsc/bindings/bindings.cpp, src/jsc/bindings/headers.h — new collectAsyncFull() binding → heap.collectAsync(JSC::CollectionScope::Full).
  • src/bun_core/env_var.zig + init() — registers BUN_GC_TIMER_INTERVAL / BUN_GC_TIMER_DISABLE / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS and reads them via bun.env_var (fixing a pre-existing bug where these were read from vm.transpiler.env before it was populated, so they were silently ignored).
  • test/js/bun/gc/gc-controller-idle-full.test.ts — drives the controller at a 20ms interval and verifies ~40 MB of promoted-then-released old-gen is reclaimed at idle.

Security risks

None. No untrusted input is parsed; the new env vars go through the existing bun.env_var typed accessors with safe fallbacks (std.math.cast → 1000 on overflow). The new C++ binding takes a JSLockHolder exactly like the adjacent collectAsync.

Level of scrutiny

High. The GC repeating timer runs in every Bun process and its cadence directly affects both idle CPU and steady-state RSS. The state machine here has five interacting variables (#idle_full_gcs_fired, the stable-tick counter, gc_repeating_timer_fast, and the two heap-size baselines) across two entry points (the 1s/30s repeating timer and the per-event-loop processGCTimer). Over the PR's lifetime, eight distinct edge-case bugs were found and fixed (perpetual Full GC every 60s, GC's own shrink cancelling reduction mode, stale baseline after shrink, allocation-resume not cancelling reduction while already in fast mode, the < 2 cap being bypassed, etc.) — each fix interacting with the previous one. The final state looks correct to me and the bug-hunting pass found nothing, but the number of iterations it took to converge is itself a signal that this deserves a maintainer's eyes.

Other factors

  • All nine prior inline review threads (mine and CodeRabbit's) are resolved; the latest commit 65317cf addresses my last comment about resetting the stable-tick counter when entering reduction mode.
  • The !=> change in .run_on_next_tick / .pending is a behavioral change to the pre-existing allocation-driven GC path (heap shrinks no longer reschedule the 16ms timer), not just an additive idle feature — that's the part most worth a second opinion.
  • A regression test is included and per robobun passes on all CI platforms; the only red is the chronic Windows test-http-should-emit-close flake unrelated to this PR.
  • The env-var read fix is a nice drive-by but also means BUN_GC_TIMER_DISABLE will start actually working for users who set it.

On the darwin-14-x64 runner the 2.5s pure-idle window was tight enough
that under load the repeating timer (nominally 20ms) didn't always fit 30
ticks before the poll phase began — test passed 1/4 there vs. clean on
Linux/Windows/darwin-aarch64. Bump the idle window to 5s so even at ~5x
effective timer slowdown there is room for the 30-stable-tick threshold.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior review threads are resolved and the current pass found no new issues, but this rewrites the runtime-wide GC scheduling state machine and took ~8 review rounds to converge — worth a human pass over the final onGCRepeatingTimer / processGCTimerWithHeapSize interaction before merge.

Extended reasoning...

Overview

This PR changes GarbageCollectionController so that after 30 stable fast-mode ticks it fires up to 2 explicit collectAsync(CollectionScope::Full) calls (V8 MemoryReducer-style) before dropping to the 30s slow interval, instead of relying on JSC's Eden→Full promotion heuristic which never crosses at idle. It adds a new JSC__VM__collectAsyncFull binding (bindings.cpp / headers.h / VM.zig), reworks onGCRepeatingTimer and processGCTimerWithHeapSize around a new #idle_full_gcs_fired state, migrates the three BUN_GC_TIMER_* env vars from vm.transpiler.env (which was unpopulated at init() time, so they were silently ignored) to bun.env_var, and adds an integration test that drives the controller at a 20ms interval and asserts ~40 MB of promoted old-gen is reclaimed at idle.

Security risks

None. No untrusted input, no auth/crypto/permissions surface; the only external input is the three BUN_GC_* env vars, which now go through the typed env_var registry with std.math.cast bounds.

Level of scrutiny

High. GarbageCollectionController runs in every Bun process and its timer cadence directly shapes both idle CPU and steady-state RSS. The state machine here couples five pieces of state (#idle_full_gcs_fired, heap_size_didnt_change_for_repeating_timer_ticks_count, gc_repeating_timer_fast, gc_last_heap_size, gc_last_heap_size_on_repeating_timer) across two entry points (the 1s/30s repeating timer and the per-event-loop-tick processGCTimer), and the review history shows every adjustment to one invariant exposed a new edge in another (cancel-on-growth vs. starve-on-transient-allocation; baseline ownership vs. shrink-tracking; counter reset placement vs. the < 2 cap). The final code looks correct to me and every thread is resolved, but a heuristic this load-bearing deserves a human read of the converged version rather than a bot sign-off.

Other factors

  • All 9 inline review threads (mine + CodeRabbit) are marked resolved, and I verified the final diff incorporates each fix — including the most recent one (resetting the stable-tick counter at the first-Full-GC entry so the allocation-path #idle_full_gcs_fired = 0 clear can't re-trigger the branch and bypass the < 2 cap).
  • robobun reproduced the underlying issue on main and reports the new test passes on all CI platforms; remaining CI red is the known unrelated Windows test-http-should-emit-close flake.
  • The env-var migration is a real behavioral fix (these vars were previously dead) and follows the established env_var.zig pattern.
  • No CODEOWNERS cover these paths.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found — all prior threads are resolved and the state machine now looks consistent. Deferring to a human for the final sign-off since this rewrites the core GC-timer scheduling that runs in every Bun process, and the eight rounds of fixes here show how easy it is to get the two-path interaction subtly wrong.

Extended reasoning...

Overview

This PR changes GarbageCollectionController to fire up to two explicit collectAsync(CollectionScope::Full) calls at the fast→slow idle transition (modeled on V8's MemoryReducer), so old-gen garbage and age-eligible JIT CodeBlocks are reclaimed before the controller drops to the 30s slow interval. It also: adds a new JSC__VM__collectAsyncFull binding (bindings.cpp / headers.h / VM.zig); decouples the repeating-timer heap baseline from the allocation-driven baseline; changes processGCTimerWithHeapSize to treat only heap growth as activity and to track GC-driven shrinks in .pending; migrates BUN_GC_TIMER_INTERVAL / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS / BUN_GC_TIMER_DISABLE from vm.transpiler.env (which was unpopulated at init() time) to bun.env_var; and adds an integration test that drives the controller at a 20 ms interval and asserts the ~40 MB promoted heap drops below 25%.

Security risks

None identified. No untrusted input parsing, no auth/crypto/permissions surface. The new C++ binding is a thin wrapper around an existing JSC heap API and takes no user-controlled data. The env-var reads go through the existing typed bun.env_var accessors.

Level of scrutiny

High. This is the GC scheduling state machine that runs on a repeating timer in every Bun process and interacts with the allocation-driven processGCTimer path through several shared state variables (gc_last_heap_size, gc_last_heap_size_on_repeating_timer, heap_size_didnt_change_for_repeating_timer_ticks_count, #idle_full_gcs_fired, gc_repeating_timer_fast). Errors here manifest as either retained memory in long-running servers or wasted CPU on spurious Full GCs — neither shows up as a test failure. The review history bears this out: eight successive state-machine issues were found and fixed across commits 8820bc56eb16c609bd0d3af48886d2332a4a52ca18c38b22765317cf, several of which were regressions of earlier fixes. The current revision looks internally consistent to me and the bug hunter found nothing new, but that iteration history is exactly why a human should do the final read-through.

Other factors

  • The env-var migration is a real behavior change: the old vm.transpiler.env.get(...) reads ran before runEnvLoader() and were effectively dead, so BUN_GC_TIMER_INTERVAL etc. now actually take effect for the first time. Worth a conscious sign-off (and BUN_GC_TIMER_DISABLE is now truthy-checked rather than presence-checked).
  • The new test covers the headline behavior (idle Full GC fires and reclaims old-gen) but not the harder-to-test edge cases that drove most of the review churn (allocation resuming mid-reduction, fast→fast no-op transitions, baseline contamination).
  • CI failures are the known Windows fetch()-abort flake and unrelated S3/install/hot tests, per robobun's note.

robobun added a commit that referenced this pull request May 5, 2026
Per review: the 512 was an arbitrary round number chosen to cross JSC's
256-byte minExtraMemory fast-path threshold (so reportExtraMemoryAllocated
would drive didAllocate in release). Since this PR is scoped to the
accounting/hygiene side — #29280 is the real fix for GC pacing — the
floor is unnecessary inflation.

visitor.reportExtraMemoryVisited in visitChildren is the hot path for
this PR's goal (accurate extraMemorySize); it has no size threshold, so
every byte reported counts regardless of build mode. Releasing @sizeof(Self)
straight through is the honest number: ~464 bytes in debug/ASAN, ~96 in
release.

Lower the test threshold accordingly: 100,000 stays comfortably above
the ~20 KB noise floor we measured without the fix and under the ~190 KB
release-mode signal.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No remaining issues from my side — all prior threads are resolved and the final state machine looks coherent — but since this rewrites the runtime GC scheduling that every Bun process depends on (and took several rounds to converge), it's worth a human sign-off before merge.

Extended reasoning...

Overview

This PR reworks GarbageCollectionController.onGCRepeatingTimer to fire up to two explicit collectAsync(CollectionScope::Full) calls at the fast→slow transition (modeled on V8's MemoryReducer), so old-gen garbage and age-eligible CodeBlocks are reclaimed before the controller drops to the 30 s interval. It adds a new #idle_full_gcs_fired state field, a JSC__VM__collectAsyncFull binding (Zig + C++ + headers), changes processGCTimerWithHeapSize to treat only heap growth as activity (and to lower gc_last_heap_size on GC-driven shrinks), migrates the three BUN_GC_TIMER_* env vars from vm.transpiler.env to bun.env_var (fixing them being read before runEnvLoader() populated the transpiler env), and adds an integration test that drives the controller at a 20 ms interval and asserts ~40 MB of promoted-then-released arrays is reclaimed at idle.

Security risks

None. No untrusted input handling, auth, crypto, or privilege boundaries are touched. The new C++ binding is a thin wrapper around an existing JSC heap API and takes the JSLock exactly like the adjacent collectAsync binding. The env-var reads use the typed bun.env_var accessors and only affect GC timer cadence.

Level of scrutiny

High. This is the per-VM GC scheduling state machine — it runs in every Bun process and governs memory pressure vs. CPU overhead at idle. The change is not mechanical: the interaction between #idle_full_gcs_fired, heap_size_didnt_change_for_repeating_timer_ticks_count, gc_repeating_timer_fast, and the two heap baselines is subtle, and the PR went through roughly eight rounds of review-driven fixes (counter not reset on slow→slow no-op, allocation path not cancelling reduction, GC-driven shrink mistaken for activity, repeating-timer baseline contaminating the allocation-driven baseline, .pending not lowering its baseline after shrink, fast→fast no-op clearing the stable-tick counter, and the < 2 cap being bypassable). All of those threads are now resolved and reflected in the final diff, the new test passes on all platforms in CI, and the current bug-hunting pass found nothing — but the iteration history itself is evidence that this state machine is easy to get subtly wrong.

Other factors

The change also alters pre-existing behavior in processGCTimerWithHeapSize (the !=> semantics in .run_on_next_tick / .pending), which slightly changes allocation-driven GC cadence independent of the new idle-Full-GC feature. CI is green on Linux/macOS with only known-flaky Windows failures unrelated to this diff. Given the runtime-wide blast radius and the design judgment involved (e.g., the 30-tick / 1 MB / max-2 thresholds), a human reviewer should sign off rather than auto-approving.

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator Author

Is there a bench showing no CPU / perf regression?

@robobun

robobun commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Release builds of main@314ffe307 (this PR's merge-base) vs jarred/idle-full-gc@3fb4b840f — same host, rounds interleaved. Only the PR diff separates the two binaries.

1. HTTP throughput with per-request allocationBun.serve returning Response.json({20 random items}), ab -l -c 100 -n 100000, 6 interleaved rounds:

req/s (mean) req/s (min–max) server CPU for 100k reqs
main 30,275 29,269 – 31,201 2,953 ms
PR 30,147 29,677 – 31,207 2,935 ms
Δ −0.4 % −0.6 %

Within the ~6 % run-to-run noise of either binary alone. processGCTimerWithHeapSize runs on every event-loop tick here; the != → > change and the new #idle_full_gcs_fired clear are invisible at this scale.

2. Tight allocation loop yielding to the event loop — 200k object allocs, await Promise.resolve() every 64 iters (~3,125 processGCTimer calls), 8 interleaved rounds:

wall ms (mean) min–max
main 23.89 23.2 – 24.9
PR 23.88 23.1 – 24.4

No measurable difference.

3. Idle CPU — allocate 40 MB, Bun.gc(true) ×2 to promote to old-gen, release, then await Bun.sleep(45000) (covers the 30-tick fast→slow transition at the default 1 s interval), 3 interleaved rounds:

CPU over 45 s idle RSS before → after
main 50 / 50 / 50 ms +1.0 / +1.0 / +1.0 MB
PR 60 / 60 / 50 ms −1.0 / 0 / 0 MB

The ~7 ms delta is the 1–2 async Full GCs at the transition — one-shot per idle activation, ≈0.016 % of the idle wall time. RSS drifts up on main and holds/drops on the PR, which is the point.

scripts

Server:

const server = Bun.serve({
  port: 0,
  fetch(req) {
    const url = new URL(req.url);
    const data = { path: url.pathname, ts: Date.now(),
      items: Array.from({ length: 20 }, (_, i) => ({ id: i, v: Math.random() })) };
    return Response.json(data);
  },
});
console.log(server.port);

Alloc loop:

const N = 200_000; let sink = 0;
const t0 = performance.now();
for (let i = 0; i < N; i++) {
  const o = { a: i, b: [i, i + 1, i + 2], c: "x".repeat(16) };
  sink += o.a;
  if ((i & 63) === 0) await Promise.resolve();
}
console.log({ ms: (performance.now() - t0).toFixed(1), sink });

Idle:

import { readFileSync } from "node:fs";
const cpuMs = () => { const f = readFileSync("/proc/self/stat","utf8").split(" ");
  return ((+f[13]+ +f[14])*1000)/100; };
let data = []; for (let i=0;i<5000;i++) data.push(new Array(1000).fill(i));
Bun.gc(true); Bun.gc(true); data = null;
await Bun.sleep(100);
const b = cpuMs(), r0 = process.memoryUsage.rss();
await Bun.sleep(45_000);
console.log({ idle_cpu_ms: Math.round(cpuMs()-b),
  rss_before_mb:(r0/1048576).toFixed(1), rss_after_mb:(process.memoryUsage.rss()/1048576).toFixed(1) });

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Superseded by #30725 (Rust port on current main).

@robobun robobun closed this May 14, 2026
robobun added a commit that referenced this pull request May 23, 2026
…ow interval

Rust port of #29280.

The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle
JSC keeps picking Eden because Heap::updateAllocationLimits ratchets
m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays
instead of crossing. After 30 stable ticks we go to the 30s interval
having never collected old-gen, and age-based CodeBlock jettison (TTL
5-60s, only evaluated during GC marking) never fires.

At the fast->slow transition, fire one explicit
collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on
the next tick. Then go to slow. Max 2 Full GCs per idle activation, then
silent - same convergence rule as V8's MemoryReducer. If allocation
resumes mid-reduction, drop back to fast.

No change to allocation-driven GC frequency; cost is at most 2 async
Full GCs per active->idle transition.

Also routes the BUN_GC_TIMER_* env reads through bun_core::env_var
(process-env backed) instead of vm.transpiler.env, which is still null
when init() runs from ensure_waker().

New binding: JSC__VM__collectAsyncFull -> heap.collectAsync(CollectionScope::Full).
robobun added a commit that referenced this pull request Jun 7, 2026
…ow interval

Rust port of #29280.

The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle
JSC keeps picking Eden because Heap::updateAllocationLimits ratchets
m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays
instead of crossing. After 30 stable ticks we go to the 30s interval
having never collected old-gen, and age-based CodeBlock jettison (TTL
5-60s, only evaluated during GC marking) never fires.

At the fast->slow transition, fire one explicit
collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on
the next tick. Then go to slow. Max 2 Full GCs per idle activation, then
silent - same convergence rule as V8's MemoryReducer. If allocation
resumes mid-reduction, drop back to fast.

No change to allocation-driven GC frequency; cost is at most 2 async
Full GCs per active->idle transition.

Also routes the BUN_GC_TIMER_* env reads through bun_core::env_var
(process-env backed) instead of vm.transpiler.env, which is still null
when init() runs from ensure_waker().

New binding: JSC__VM__collectAsyncFull -> heap.collectAsync(CollectionScope::Full).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants