Skip to content

GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#30725

Open
robobun wants to merge 3 commits into
mainfrom
farm/813505e7/gc-idle-full-rust
Open

GarbageCollectionController: fire up to 2 Full GCs before going to slow interval#30725
robobun wants to merge 3 commits into
mainfrom
farm/813505e7/gc-idle-full-rust

Conversation

@robobun

@robobun robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Rust port of #29280.

The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle JSC keeps picking Eden because Heap::updateAllocationLimits ratchets m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays instead of crossing. After 30 stable ticks we go to the 30s interval having never collected old-gen, and age-based CodeBlock jettison (TTL 5-60s, only evaluated during GC marking) never fires.

Benchmark from #29280 on Claude Code: fullGC() at steady state reclaims ~103 MB; gcAndSweep ~141 MB — that is the dead old-gen + jettison-eligible JIT code this timer was leaving on the table.

Change: at the fast→slow transition, fire one explicit collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on the next tick. Then go to slow. Max 2 Full GCs per idle activation, then silent — same convergence rule as V8's MemoryReducer (kRun→kDone when started_gcs >= 2 || freed <= 1MB). If allocation resumes mid-reduction, drop back to fast.

No change to allocation-driven GC frequency; cost is at most 2 async Full GCs per active→idle transition.

New binding: JSC__VM__collectAsyncFullheap.collectAsync(JSC::CollectionScope::Full).

Port-specific changes vs #29280:

  • The BUN_GC_TIMER_* / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS reads go through bun_core::env_var (process-environment backed) rather than vm.env_loader_opt(). In the Rust port init() runs from ensure_waker() before Transpiler::init has populated vm.transpiler.env, so the loader path always returned None and these knobs were silently ignored. The Zig PR made the equivalent change.
  • The test runs the fixture as a file via tempDir instead of bun -e. The Rust port's is_one_shot_eval_invocation() sets numberOfGCMarkers=1 for -e, which stalls the concurrent collector while the mutator is parked in Bun.sleep, so the async Full request never reaches its stop-the-world phase. Running as a file uses the normal multi-threaded GC configuration (same as a long-running server, which is what this change targets).

Verification:

# stash src/ → build → test (main behavior)
Expected: < 10123416
Received: 41412570
(fail)

# unstash → build → test (this PR)
(pass) GC controller fires a Full GC at idle so old-gen garbage is reclaimed [7114.77ms]

Heap drops from ~40 MB to ~1 MB once the idle Full GC reclaims the promoted arrays.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 1:05 AM PT - Jun 10th, 2026

@robobun, your commit 2248344 has 1 failures in Build #61679 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30725

That installs a local version of the PR into your bun-30725 executable, so you can run:

bun-30725 --bun

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds GC environment variables, exposes a VM FFI for async full collection, rewrites GarbageCollectionController to detect idle state and perform capped async Full GCs for old-gen reclamation, and adds a test that validates heap shrinkage during idle.

Changes

Idle Garbage Collection via Async Full Collection

Layer / File(s) Summary
GC environment variables
src/bun_core/env_var.rs
Three new public environment variables declare GC timer interval (BUN_GC_TIMER_INTERVAL), disabled flag (BUN_GC_TIMER_DISABLE), and Full-GC throttle count (BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS).
FFI binding for async-full collection
src/jsc/bindings/bindings.cpp, src/jsc/bindings/headers.h, src/jsc/VM.rs
C++ binding acquires VM lock and calls heap.collectAsync(Full); header declares the entry point; Rust FFI layer and public VM::collect_async_full() method expose the binding to controller code.
GC controller state and initialization
src/jsc/GarbageCollectionController.rs
Struct gains idle_full_gcs_fired: u8 field; imports switch to bun_core::env_var; init() reads GC timer interval, disabled flag, and Full-GC-run limit directly from environment getters and configures the repeating timer.
GC controller scheduling and reduction logic
src/jsc/GarbageCollectionController.rs
Heap growth clears idle_full_gcs_fired; growth/decrease paths are split in process_gc_timer_with_heap_size; Fast-mode transition resets the idle flag; on_gc_repeating_timer implements reduction mode using capped Full-GC calls after idle, stable-tick detection for explicit Full GCs after 30 unchanged Fast ticks, and consistent collect_async() on non-early-return paths.
Idle GC behavior test
test/js/bun/gc/gc-controller-idle-full.test.ts
Fixture script allocates ~40MB heap, forces two synchronous Full GCs to promote to old-gen, releases references, then sleeps to trigger idle behavior; test harness spawns subprocess with BUN_GC_TIMER_INTERVAL=20, polls heap until it drops below 1/4 initial, and asserts successful process exit within 30s.

Suggested reviewers:

  • Jarred-Sumner
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding Full GC behavior to the GarbageCollectionController during the fast-to-slow transition.
Description check ✅ Passed The PR description comprehensively covers what the change does, the technical rationale, port-specific differences, and verification results. However, it does not follow the repository's required template structure with explicit 'What does this PR do?' and 'How did you verify your code works?' sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bun_core/env_var.rs`:
- Around line 92-95: The documentation for BUN_GC_TIMER_INTERVAL on the
GarbageCollectionController is outdated: it says the controller fires a single
Full GC then drops to slow mode, but the current logic can request up to two
Full GCs before switching to the slow 30s interval. Update the comment above
new!(pub BUN_GC_TIMER_INTERVAL: unsigned, "BUN_GC_TIMER_INTERVAL", {}) to
describe the two-step behavior (fast repeating ticks, up to two Full GC requests
on non-growing ticks, then switch to the fixed 30s slow interval) and ensure the
wording references the 30 non-growing tick threshold and the possibility of two
Full GCs before slow mode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 62b74ca3-dfe4-42d5-8fc1-caa5f606fc97

📥 Commits

Reviewing files that changed from the base of the PR and between 63035b3 and e8ec476.

📒 Files selected for processing (6)
  • src/bun_core/env_var.rs
  • src/jsc/GarbageCollectionController.rs
  • src/jsc/VM.rs
  • src/jsc/bindings/bindings.cpp
  • src/jsc/bindings/headers.h
  • test/js/bun/gc/gc-controller-idle-full.test.ts

Comment thread src/bun_core/env_var.rs Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Found 3 issues this PR may fix:

  1. Bun appears to delay GC in low-pressure Fastify + mongodb workload, causing slow RSS/Timeout growth #30261 - GC delay in low-pressure Fastify + mongodb workload causes slow RSS/Timeout growth; reporter confirmed Bun.gc(true) every 30s stabilizes it — exactly the idle Full GC gap this PR closes
  2. Memory (RSS) in Bun Spawned Child Process Grows Slowly, Even When Idle #21560 - RSS in idle child process grows slowly despite flat JS heap — old-gen objects and jettison-eligible JIT code never reclaimed without Full GC at idle
  3. Add a way to make bun repeatedly garbage collect and run finalizers until the garbage collection finds no unused memory #24285 - Request for repeated GC + finalizer runs until no unused memory remains — idle Full GC would address the underlying need for more thorough collection

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #30261
Fixes #21560
Fixes #24285

🤖 Generated with Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Reduce idle CPU wakeups from GC repeating timer #28696 - Also modifies GarbageCollectionController idle behavior (skips collectAsync when heap is stable, reduces slow-mode threshold from 30 to 5 ticks), targeting the Zig source which is now superseded by this Rust port
  2. fix(gc): reduce idle CPU by transitioning GC timer to slow mode faster #27366 - Also modifies GarbageCollectionController to reduce slow-mode transition threshold from 30 to 5 ticks, targeting the Zig source

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this rewrites the GC controller's idle-detection state machine and changes the growth/shrink semantics in process_gc_timer_with_heap_size — core runtime heuristics that affect every process — so it warrants a human look.

Extended reasoning...

Overview

This PR ports #29280 to the Rust source tree. It adds an idle_full_gcs_fired counter to GarbageCollectionController, substantially rewrites on_gc_repeating_timer to introduce a "reduction mode" that fires up to 2 explicit collectAsync(CollectionScope::Full) calls at the fast→slow transition, and changes process_gc_timer_with_heap_size so that only heap growth (not any change) signals activity. It also adds a trivial C++ binding (JSC__VM__collectAsyncFull), three env_var declarations, switches the controller's env reads from vm.env_loader_opt() to bun_core::env_var, and includes an integration test.

Security risks

None. No untrusted input handling, auth, or data exposure paths are touched. The new env vars are debug/tuning knobs read at init and follow the existing pattern.

Level of scrutiny

High. The GC repeating timer runs in every Bun process and the state machine here has several interacting pieces: the 30-tick stable counter, the new idle_full_gcs_fired cap, the 1 MB freed threshold, and the changed != prev> prev activity test (which means heap shrinkage no longer kicks the controller back to fast mode or reschedules the one-shot timer). The repeating-timer callback also no longer updates gc_last_heap_size via perform_gc(), which the diff acknowledges and compensates for in the Pending arm. These are all reasonable and well-commented, but they're heuristic design choices with process-wide perf implications that a human familiar with the JSC GC model should sign off on.

Other factors

  • The design mirrors a previously-landed Zig PR (#29280) and V8's MemoryReducer convergence rule, which lends confidence.
  • A regression test is included and the PR description shows before/after verification.
  • The robobun CI failures are pre-existing lint warnings in unrelated files (hw_timer.rs, Blob.rs, server/mod.rs, etc.), not introduced here.
  • coderabbit's only finding (doc wording) was addressed in ecab4ad.
  • github-actions flagged two potentially overlapping open PRs (#28696, #27366) targeting the older Zig source — worth a human confirming this supersedes them.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

CI status (latest: build #61679 at 2248344, maintainer's merge of current main — now settled):

Zero test failures. Every red job is agent infrastructure:

  • darwin-26-aarch64-test-bun: one shard passed, one agent-level timeout (exit 255, no test-failure annotation).
  • darwin-14-aarch64-test-bun (both shards) and one windows-2019-x64-test-bun shard: Expired — no agent ever picked them up (the windows one traces to an Azure "Failed to create agent" error in the robobun annotation).
  • solc.test.ts flake passed on retry.

Earlier unrelated breaks are resolved on main: bunx.test.ts skipped in #32042, the duckdb segfault did not recur.

test/js/bun/gc/gc-controller-idle-full.test.ts has passed on every lane that ran, in every build of this PR. The diff is green; needs a maintainer to re-run the expired jobs or merge.

@robobun robobun force-pushed the farm/813505e7/gc-idle-full-rust branch from ecab4ad to 9f2312b Compare May 23, 2026 14:14

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this reworks the GC controller's scheduling state machine (idle Full-GC reduction mode, !=> heap-comparison semantics, perform_gc()collect_async() in the repeating timer, BUN_GC_TIMER_DISABLE presence→truthy check) — runtime-wide GC behavior is worth a human look.

Extended reasoning...

Overview

This PR ports Zig PR #29280 to the Rust GarbageCollectionController, adding an idle "reduction mode" that fires up to 2 explicit collectAsync(CollectionScope::Full) calls before transitioning to the 30s slow interval. It touches src/jsc/GarbageCollectionController.rs (state machine, new idle_full_gcs_fired field, rewritten on_gc_repeating_timer), src/jsc/VM.rs + src/jsc/bindings/bindings.cpp + headers.h (new JSC__VM__collectAsyncFull binding), src/bun_core/env_var.rs (three new env-var declarations), and adds an integration test.

Security risks

None identified. The change is internal GC scheduling; no user input parsing, auth, crypto, or data exposure surface. The new env vars are debug/tuning knobs read via the existing typed env_var machinery.

Level of scrutiny

High. This is process-wide GC scheduling that runs in every Bun process. Beyond the headline feature, several subtle semantic shifts ride along:

  • process_gc_timer_with_heap_size: this_heap_size != prevthis_heap_size > prev in both RunOnNextTick and Pending arms, plus a new < prev branch that lowers gc_last_heap_size without rescheduling. This changes the existing allocation-driven path, not just the new idle path.
  • on_gc_repeating_timer: previously called this.perform_gc() (which updates gc_last_heap_size); now calls vm.collect_async() directly, so gc_last_heap_size is no longer updated from the repeating timer. The PR's new < prev branch in Pending compensates, but the interaction between the two timers and gc_last_heap_size vs gc_last_heap_size_on_repeating_timer is delicate.
  • BUN_GC_TIMER_DISABLE: was env.has(...) (presence check, so BUN_GC_TIMER_DISABLE=0 disabled the timer); now a truthy boolean check (so =0 no longer disables). Minor but a user-visible behavior change.
  • The reduction-mode state machine has multiple reset points for idle_full_gcs_fired and heap_size_didnt_change_for_repeating_timer_ticks_count across two callbacks; the comments are thorough but the cross-callback invariants are non-obvious.

Other factors

The new test passed on all CI lanes including ASAN, and CodeRabbit's one doc nit was addressed. The PR is well-reasoned and references V8's MemoryReducer convergence rule. However, GC timing changes can have hard-to-spot regressions (latency spikes, memory not reclaimed under specific allocation patterns) that a single fixture test won't catch, and the collateral changes to the existing allocation-driven path warrant a maintainer's eye.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this rewrites the process-wide GC scheduling state machine (reduction mode, the !=> activity heuristic, env-var sourcing) in ways that affect every Bun process, so it's worth a human look before merging.

Extended reasoning...

Overview

This PR ports #29280 to the Rust tree: it rewrites on_gc_repeating_timer to fire up to two explicit collectAsync(CollectionScope::Full) calls at the fast→slow transition, adds a new idle_full_gcs_fired reduction-mode state, changes process_gc_timer_with_heap_size to treat only heap growth (not any change) as activity, switches init() to read BUN_GC_TIMER_* / BUN_GC_RUNS_UNTIL_SKIP_RELEASE_ACCESS via bun_core::env_var instead of vm.env_loader_opt(), adds the JSC__VM__collectAsyncFull C++/Rust binding, declares the three env vars in env_var.rs, and adds an integration test.

Security risks

None. The change is internal GC timing logic; the only user-facing inputs are pre-existing tuning env vars, and the new binding just forwards to heap.collectAsync(Full) under the JSLock.

Level of scrutiny

High. This is runtime-wide GC scheduling that runs in every Bun process. The repeating-timer callback is essentially rewritten with a new state machine, and the !=> change in process_gc_timer_with_heap_size alters when the allocation-driven path reschedules / lowers the baseline. A subtle mistake here could regress either CPU (over-collecting) or RSS (under-collecting) across the board. While it mirrors an already-landed Zig PR and has a passing end-to-end test, the port-specific deviations (env-var sourcing, the one-shot-eval test workaround) and the cross-path interaction between idle_full_gcs_fired resets in process_gc_timer_with_heap_size vs the reduction-mode branch in on_gc_repeating_timer are the kind of heuristic tuning a human reviewer should sanity-check.

Other factors

The bug-hunting system found no issues, CodeRabbit's only finding (a doc-comment wording nit) was addressed in ecab4ad, the new test passed on all CI lanes including ASAN, and the one remaining CI failure is a known-flaky S3 leak unrelated to this diff. No CODEOWNERS entry covers these paths. Given the scope and the process-wide blast radius, deferring rather than approving.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this rewrites the GC controller's idle state machine and changes growth-vs-change semantics in process_gc_timer_with_heap_size — core runtime behavior that affects every process, so worth a human look even though it's a port of #29280.

Extended reasoning...

Overview

This PR modifies Bun's GarbageCollectionController to fire up to 2 explicit async Full GCs at the fast→slow idle transition, addressing old-gen garbage that JSC's Eden-only collections leave behind. It's a Rust port of #29280 (Zig). Files touched: GarbageCollectionController.rs (substantial state-machine rewrite in on_gc_repeating_timer plus new idle_full_gcs_fired field and reduction-mode logic), VM.rs / bindings.cpp / headers.h (new collectAsyncFull FFI binding — straightforward), env_var.rs (registers three GC tuning env vars), and a new subprocess test.

Security risks

None. This is purely GC scheduling — no auth, crypto, permissions, untrusted input parsing, or network surface. The new env vars are debug/tuning knobs read from the process environment.

Level of scrutiny

Moderate-to-high. GC scheduling is performance-critical runtime code that runs in every Bun process. Beyond the headline change, the PR also:

  • Changes process_gc_timer_with_heap_size from !=-triggered to >-only (growth) for both RunOnNextTick and Pending states, with a new shrink branch that lowers the baseline without rescheduling. This is well-commented but is a semantic change to the existing allocation-driven path, not just an idle addition.
  • Replaces perform_gc() at the top of on_gc_repeating_timer with a direct block_bytes_allocated() read + trailing collect_async(), so gc_last_heap_size is no longer updated from the repeating timer (the comment in the new Pending shrink branch references this).
  • Switches BUN_GC_TIMER_DISABLE from presence-check (env.has) to boolean parse, so BUN_GC_TIMER_DISABLE=0 now means enabled where previously any value disabled. Likely intentional, but a behavior change.

These all look correct and are carefully justified in code comments, but the interaction between the two timer paths and the new reduction state warrants a maintainer's eye.

Other factors

CI is green on all lanes (the only red is an expired darwin agent, per robobun's status comment), the new test passed everywhere including ASAN, and this mirrors an already-landed Zig change. CodeRabbit's only finding (doc-comment wording) was addressed in ecab4ad. No outstanding human reviewer comments. I'm deferring solely on the complexity/criticality axis, not because anything looks wrong.

@alii

alii commented Jun 1, 2026

Copy link
Copy Markdown
Member

@robobun Performance implications of this in a release build?

@robobun

robobun commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Measured on release builds (build:release) of main @ e75a55d with and without this diff applied — same tree, same toolchain, 3 interleaved runs each, medians. Container is core-capped so ops numbers carry noise; CPU/RSS are stable.

Scenario (40s window unless noted) baseline this PR delta
Burst→idle (40 MB garbage, then idle): JS heap after idle 38.6 MB 0.2 MB reclaimed
Burst→idle: RSS after idle 73.4 MB 35.1 MB −52%
Burst→idle: total CPU ~180 ms ~160 ms parity
Pure idle (trivial heap): total CPU ~115 ms ~100 ms parity
Pure idle: voluntary ctx switches 5216 3816 −27%
Duty-cycle (6× 1 s busy + 2 s pause): CPU ~6.5 s ~6.4 s parity
Duty-cycle: voluntary ctx switches 22.8 k 17.7 k −22%
Active churn (10 s JSON/object loop): ops 19.0 k 17.8–19.8 k parity (noise)
Worst case — 384 MB fully-live heap, then idle: CPU ~533 ms ~725 ms +~190 ms, once

Interpretation:

  • Active path is untouched. While the heap is growing the controller behaves exactly as before (fast mode, same collectAsync() cadence), so throughput is unchanged — user CPU in the churn/duty-cycle runs is identical within noise.
  • Idle cost is bounded and one-shot. The worst case is a large heap where everything is live: the idle Full GC marks the whole live set for nothing. That's the +190 ms (≈0.5 ms per MB live) — and because nothing was freed (<1 MB), the convergence rule stops after one Full, not two, then goes to the 30 s interval and stays silent until allocation resumes. It's paid once per active→idle transition, on the concurrent collector threads.
  • No user-visible pause at idle. The mutator is parked in the event loop when the Full fires, so the stop-the-world phases complete while no JS is executing. A request arriving mid-collection pays normal concurrent-GC costs, same as any allocation-triggered Full today.
  • Small idle-CPU win as a side effect. The growth-only (>) comparison means a shrinking heap no longer bounces the controller back to fast mode, which shows up as 20–27% fewer voluntary context switches at idle.
  • The payoff is the burst→idle row: RSS halves (73 → 35 MB) because the dead old-gen actually gets collected — the exact gap described in GarbageCollectionController: fire up to 2 Full GCs before going to slow interval #29280 (where fullGC() at steady state was reclaiming ~103 MB that this timer left behind).

@alii

alii commented Jun 1, 2026

Copy link
Copy Markdown
Member

Can you share your benchmarking code? Maybe make a new repo like robobun/bun-30725-bench so that I can SSH into a powerful linux box, clone it, run a single script, and get more useful numbers. please!

@robobun

robobun commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Benchmark harness is up: https://github.com/robobun/bun-30725-bench

git clone https://github.com/robobun/bun-30725-bench
cd bun-30725-bench
./run.sh

run.sh clones oven-sh/bun at this PR's head, builds a release binary, reverse-applies the PR diff onto the same tree and builds the baseline — so the two binaries differ only by this change — then runs four scenarios against both (3 interleaved runs, medians) and prints a comparison table:

  • hello — long-lived Bun.serve hello-world: 15 s load → 75 s idle → 15 s load. RTT p50/p99 per phase, CPU consumed during the idle window, RSS at idle start/end.
  • express — Express server parsing ~8 KB JSON request bodies and writing JSON responses, same phase pattern.
  • cli-burst — CLI that allocates ~40 MB of garbage then idles 120 s (CPU + RSS trajectory).
  • cli-pure — CLI idling 120 s with a trivial heap (timer floor cost).

The driver samples the target's /proc/<pid>/stat (utime+stime) and VmRSS at 1 Hz from a separate process; load-gen is a fixed-concurrency in-process fetch loop (identical across variants — compare A vs B, not absolute numbers). RUNS, IDLE_SECS, LOAD_SECS, CONCURRENCY are env-tunable, and --skip-build + BUN_PR_BIN/BUN_BASE_BIN lets you reuse existing binaries. Keep IDLE_SECS ≥ 45 — the fast→slow transition needs ~30 s of stable heap at the default 1 s tick.

Smoke-tested end-to-end here (shortened windows, capped container); the numbers in my previous comment came from the same scripts.

@robobun

robobun commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Expanded https://github.com/robobun/bun-30725-bench to 53 scenarios (same one-script runner, ./run.sh):

  • 12 HTTP serversBun.serve ×7 (hello, JSON, POST echo, 256 KB responses, streaming, routes, header churn), node:http ×2 (hello, POST echo), Express ×3 (hello, JSON bodies, middleware stack). Each: 15 s load → 75 s idle → 15 s load, reporting RPS + RTT p50/p99 per phase, CPU consumed during the idle window, RSS at idle start/end.
  • 1 WebSocket echo with connections held open across the idle window (message RTT per burst phase).
  • 12 CLI idle variants — pure idle, 40 MB / 200 MB garbage bursts, 200 MB fully-live heap (worst case), repeated bursts, heartbeat timer, 1000 pending timers, pending promise, JIT-warmed code (CodeBlock-jettison case), string burst, Buffer burst, sleeping child process. 1 Hz CPU+RSS trajectories over 120 s.
  • 3 duty cycles — 1 s/2 s, 5 s/35 s (re-arms the idle transition every cycle), 100 ms microbursts (never idles long enough — change must be inert).
  • 25 oneshots (fixed work; wall/CPU/maxRSS parity expected) — object/array/string/rope/Buffer/TypedArray/closure/Map/promise/class churn, mixed lifetimes, retained growth, JSON round-trip, regex, Headers/Response/URL/TextEncoder/CryptoHasher/WeakRef churn, startup ×2, a 60-test bun:test suite, fs walk, spawn chain.

All 53 smoke-tested end-to-end. ./run.sh --quick ≈ 45 min after builds; full defaults ≈ 2.5 h; ONLY='server-*' filters by kind; --skip-build + BUN_PR_BIN/BUN_BASE_BIN reuses binaries.

One note: I couldn't find oven-sh/bun-perf-tester (404 for me — private?). If there's a house format you want this to match, point me at it and I'll restructure.

@alii

alii commented Jun 1, 2026

Copy link
Copy Markdown
Member

What are the results of those benchmarks in your container?

@robobun

robobun commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Ran all 53 scenarios in this container (full tables: sample-results/container-2026-06-02). Binaries from the same tree (main @ 61bd997) ± the PR diff. Settings: RUNS=1, IDLE=60 s, LOAD=8 s, CONC=16. Caveat: core-capped noisy container — single-run RPS/RTT swings ±20-30% in both directions; CPU-during-idle and RSS columns are the reliable ones.

Memory (the point of the change):

scenario RSS base RSS PR
cli-buffers-burst-idle (end) 97.8 MB 34.7 MB
cli-repeated-bursts (end) 77.5 MB 37.5 MB
duty-1s-busy-2s-pause (end) 64.3 MB 40.7 MB
server-node-hello (idle end) 76.9 MB 57.1 MB
server-express-bodies (idle start→end) 86.4→74.9 MB 72.7→72.5 MB (reclaimed earlier)
cli-burst-idle-40mb JS heap (workload line) 38.62 MB 0.21 MB

One nuance visible in the raw data: the plain-object 40 MB burst dropped the JS heap to 0.21 MB but OS page return lagged in that particular run (RSS ~flat at 60 s; my earlier 40 s-idle run of the same shape returned to 35 MB). Buffer-backed and repeated-burst variants returned pages fully. Page decommit timing varies; heapMB is the direct effect of the Full GC.

CPU:

  • Idle windows: 0.08–0.23 s consumed over 60 s for both variants across all 13 servers — flat, occasionally +0.1 s on PR (the Full GC itself).
  • Worst cases as designed: cli-live-idle-200mb +0.09 s (one-time mark of a fully-live set), cli-jit-warm-idle +0.16 s (marking warmed code). Both one-shot per idle transition.
  • duty-microbursts (never idles long enough): parity — change stays inert.
  • Busy-loop duty cycles: 21.09 vs 21.61 s and 5.29 vs 5.58 s — ≤5%, within this box's noise.

Throughput/e2e: 25 oneshots landed within noise. Four rows that looked slow in the single-run table (json-roundtrip +17%, url-churn +34%, alloc-promises, spawn-chain) were re-run 3× interleaved: medians flipped to parity or PR-faster (json-roundtrip: base 6418 ms vs PR 5649 ms) — the originals ran an hour apart under different container load. Server RTT p50/p99 before/after idle: parity within the same noise band, swings in both directions.

Summary: memory clearly down where there's garbage to reclaim, CPU flat modulo the bounded one-time mark, RTT/e2e unchanged — matching the smaller earlier run. For trustworthy RPS/RTT numbers this wants your quiet box: ./run.sh (or RUNS=5 ./run.sh --skip-build with these binaries).

robobun added 2 commits June 7, 2026 17:34
…ow interval

Rust port of #29280.

The repeating timer's collectAsync() lets JSC pick Eden vs Full. At idle
JSC keeps picking Eden because Heap::updateAllocationLimits ratchets
m_maxHeapSize on every Eden GC, so the 1/3 Full-promotion ratio decays
instead of crossing. After 30 stable ticks we go to the 30s interval
having never collected old-gen, and age-based CodeBlock jettison (TTL
5-60s, only evaluated during GC marking) never fires.

At the fast->slow transition, fire one explicit
collectAsync(CollectionScope::Full). If it freed >1 MB, fire one more on
the next tick. Then go to slow. Max 2 Full GCs per idle activation, then
silent - same convergence rule as V8's MemoryReducer. If allocation
resumes mid-reduction, drop back to fast.

No change to allocation-driven GC frequency; cost is at most 2 async
Full GCs per active->idle transition.

Also routes the BUN_GC_TIMER_* env reads through bun_core::env_var
(process-env backed) instead of vm.transpiler.env, which is still null
when init() runs from ensure_waker().

New binding: JSC__VM__collectAsyncFull -> heap.collectAsync(CollectionScope::Full).
@robobun robobun force-pushed the farm/813505e7/gc-idle-full-rust branch from 908b782 to b040223 Compare June 7, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants