Skip to content

test(sql): bound spawned fixture lifetime so a test timeout cannot leave orphans#32207

Merged
alii merged 1 commit into
mainfrom
ali/sql-test-fixture-timeout
Jun 12, 2026
Merged

test(sql): bound spawned fixture lifetime so a test timeout cannot leave orphans#32207
alii merged 1 commit into
mainfrom
ali/sql-test-fixture-timeout

Conversation

@alii

@alii alii commented Jun 12, 2026

Copy link
Copy Markdown
Member

When a test that does await using proc = Bun.spawn(...) times out before the using-scope exits, the dispose never runs and the subprocess survives indefinitely. The sql fixture subprocesses connect to a (mock or real) database and await the pool draining; under a regression that hangs the pool (e.g. while iterating on #32145), the fixture never exits, the test times out, and the fixture is orphaned.

Locally observed 16 such orphans pinning the machine at load ~100 for hours after a bun bd test test/js/sql/ run that exercised a buggy intermediate commit.

Adding timeout to the Bun.spawn options gives the child its own hard deadline independent of the test runner, so an abandoned fixture self-kills within a minute even when the parent test has already moved on.

sql.test.ts already had timeout on all three of its spawns; this brings the other 8 files in line.

…ave orphans

When a test that does `await using proc = Bun.spawn(...)` times out
before the using-scope exits, the dispose never runs and the subprocess
survives indefinitely. The sql fixture subprocesses connect to a (mock
or real) database and await the pool draining; under a regression that
hangs the pool, the fixture never exits, the test times out, and the
fixture is orphaned. Locally observed 16 such orphans pinning the
machine at load ~100 for hours.

Adding `timeout` to the Bun.spawn options gives the child its own hard
deadline independent of the test runner, so an abandoned fixture
self-kills within a minute even when the parent test has already moved
on.
@robobun

robobun commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator
Updated 8:28 PM PT - Jun 12th, 2026

@alii, your commit 34cc345 has 3 failures in Build #62160 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32207

That installs a local version of the PR into your bun-32207 executable, so you can run:

bun-32207 --bun

@alii alii merged commit 80d92d4 into main Jun 12, 2026
6 of 7 checks passed
@alii alii deleted the ali/sql-test-fixture-timeout branch June 12, 2026 19:56
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 264b9d53-cf9a-46f1-aa47-503067e00539

📥 Commits

Reviewing files that changed from the base of the PR and between ac312f0 and 34cc345.

📒 Files selected for processing (8)
  • test/js/sql/sql-mysql-bind-blob-borrow.test.ts
  • test/js/sql/sql-mysql-bind-oob.test.ts
  • test/js/sql/sql-mysql-clean-reentry.test.ts
  • test/js/sql/sql-mysql-column-name-digits.test.ts
  • test/js/sql/sql-mysql-columns-realloc-oom.test.ts
  • test/js/sql/sql-mysql-datetime-roundtrip.test.ts
  • test/js/sql/sql-mysql-query-string-leak.test.ts
  • test/js/sql/sql-onconnect-onclose-throw.test.ts

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


Walkthrough

Eight MySQL/SQL regression and integration tests add explicit timeout: 60_000 milliseconds to Bun.spawn subprocess calls that execute test fixtures. One test adds timeouts at two spawn locations.

Changes

SQL Test Fixture Spawn Timeouts

Layer / File(s) Summary
Fixture spawn timeout configuration
test/js/sql/sql-mysql-bind-blob-borrow.test.ts, test/js/sql/sql-mysql-bind-oob.test.ts, test/js/sql/sql-mysql-column-name-digits.test.ts, test/js/sql/sql-mysql-clean-reentry.test.ts, test/js/sql/sql-mysql-columns-realloc-oom.test.ts, test/js/sql/sql-mysql-datetime-roundtrip.test.ts, test/js/sql/sql-mysql-query-string-leak.test.ts, test/js/sql/sql-onconnect-onclose-throw.test.ts
Each test file adds timeout: 60_000 to the Bun.spawn options for fixture subprocess execution. sql-mysql-column-name-digits.test.ts adds the timeout in two locations: the main fixture runner and the mysqld_safe server bootstrap path.

Possibly related PRs

  • oven-sh/bun#32145: Refactors SQL pooled-connection plumbing and introduces the onconnect/onclose forced-close test scenario, overlapping with the fixture timeout additions in this PR.

Comment @coderabbitai help to get the list of available commands and usage tips.

stdout: "ignore",
stderr: "ignore",
stdin: "ignore",
timeout: 60_000,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unlike the other 8 spawns in this PR (short-lived bun fixture clients), mysqld_safe is the long-running MariaDB server daemon — it is supposed to stay alive for the duration of the test, and .unref() does not cancel the spawn timeout timer. With waitForSocket budgeted up to 30s for cold start, plus provisionTcpUser and a debug/ASAN fixture run, the 60s timer can fire mid-test and SIGTERM the database out from under the fixture, causing a spurious failure. The PR's rationale (fixture clients hanging on pool drain) doesn't apply to the database daemon itself; suggest dropping this hunk or using a much larger bound (e.g. 300_000+).

Extended reasoning...

What the bug is

This PR adds timeout: 60_000 to nine Bun.spawn calls. Eight of them spawn short-lived bun fixture subprocesses that connect to a database, run a query, and exit — exactly the case the PR description targets. But the ninth, at test/js/sql/sql-mysql-column-name-digits.test.ts:91, spawns mysqld_safe — the MariaDB server daemon itself. That process is intended to outlive the test body: it is .unref()'d, its stdio is ignored, and ensureServerStarted() early-returns if the socket already exists from a prior run. Putting a 60s hard kill on it means the database server can be terminated while the test is still using it.

Why .unref() doesn't help

I verified in src/runtime/api/bun/subprocess.zig that jsUnref() only calls process.disableKeepingEventLoopAlive() and unrefs stdin/stdout/stderr — it does not touch event_loop_timer. Meanwhile computeHasPendingActivity() returns true while !process.hasExited(), which keeps the JS wrapper Strong-referenced, so GC will not collect the Subprocess and cancel the timer via finalize() either. Therefore, 60 seconds after ensureServerStarted() runs, timeoutCallback() unconditionally calls tryKill(killSignal) on the still-running mysqld_safe process — regardless of .unref().

Step-by-step trigger

  1. Test runs in the non-Docker branch (sandboxed dev/CI-gate container with native MariaDB, isDockerEnabled() === false).
  2. beforeAllensureServerStarted() spawns mysqld_safe at T=0 with timeout: 60_000 and calls .unref().
  3. waitForSocket(30_000) polls every 250ms. On a cold container start, MariaDB initialization can take a substantial fraction of that 30s budget — say T≈25s when the socket appears.
  4. provisionTcpUser() opens a root socket connection and runs 6 sequential SQL statements — a few more seconds.
  5. The test body calls runFixture(url), which spawns a debug/ASAN bun subprocess (the file's other hunk), connects over TCP, and runs the query. Debug+ASAN bun startup alone is routinely 10–20s.
  6. At T=60s the timeout timer fires and SIGTERMs mysqld_safe. MariaDB shuts down.
  7. The fixture's connection drops mid-query; it never prints CONNECTED; the test throws could not connect to mysql://bun_sql_test@127.0.0.1:3306/... — a spurious failure unrelated to the column-name-digits regression under test.

Why the PR rationale doesn't apply here

The PR description says: "the sql fixture subprocesses connect to a database and await the pool draining; under a regression that hangs the pool, the fixture never exits". mysqld_safe is not a bun SQL fixture, has no JS pool to drain, and is not subject to a #32145-style hang. The timeout also doesn't achieve the orphan-prevention goal for this spawn: because the process is .unref()'d and never awaited, when the test-runner process exits the timer dies with it and mysqld_safe is orphaned regardless. So this hunk adds a flake risk without providing the intended safety.

Fix

Either drop this one hunk (the mysqld_safe spawn), or use a bound comfortably larger than the worst-case test duration — e.g. timeout: 300_000 or more — so the daemon is reaped only when the whole test/js/sql/ run is clearly stuck, not while a single slow cold-start test is still in flight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants