Skip to content

sql: don't let a throwing onconnect/onclose callback corrupt the connection pool#32041

Merged
alii merged 2 commits into
mainfrom
farm/4e74e963/sql-pool-throwing-hooks
Jun 11, 2026
Merged

sql: don't let a throwing onconnect/onclose callback corrupt the connection pool#32041
alii merged 2 commits into
mainfrom
farm/4e74e963/sql-pool-throwing-hooks

Conversation

@robobun

@robobun robobun commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Fixes #32037.

Problem

In the postgres and mysql pooled-connection handlers (#onConnected / #onClose in src/js/internal/sql/postgres.ts and mysql.ts), the user-provided onconnect/onclose callbacks ran before the pool updated its own bookkeeping. A throwing callback aborted the handler mid-way: state stayed pending, storedError was never recorded, pending queries were never notified, onFinish never ran, and release() never ran. Anything awaiting the pool hung forever:

const sql = new SQL({
  /* ... */, max: 1,
  onconnect() { throw new Error("boom"); },
});
await sql`SELECT 1`; // never settles

Same for onclose: pending queries were never failed on a connect failure, and sql.end() never resolved.

Fix

Run the callback in a try/finally so the bookkeeping always completes, at all four sites (postgres/mysql x onconnect/onclose). The exception still propagates out of the handler afterwards and is reported as an uncaughtException through the same channel as before; only the skipped bookkeeping changes. (sqlite.ts already guards its hooks.)

Secondary fix in the same re-entrancy family, also called out in the issue: postgres's createConnection catch invoked onClose synchronously, which could run the user's onclose while the adapter was still filling this.connections during pool startup (reachable via a password function that throws). Pool methods that scan that array without the hole guard hasConnectionsAvailable() has (flush(), isConnected(), the private close()) threw TypeError: undefined is not an object when called from inside the callback. The catch now defers via process.nextTick, exactly like the identical catch in mysql.ts already did, so the callback never observes a half-filled pool.

Verification

New test/js/sql/sql-onconnect-onclose-throw.test.ts. The established-connection scenarios (throwing onconnect, throwing onclose on sql.end()) run against the real docker-compose postgres_plain and mysql_plain services via describeWithContainer, like the other sql tests. The connection-refused scenarios use a real closed port and the synchronous-failure scenario never dials, so those run without docker. On the unfixed build every scenario fails (the docker ones verified manually against real postgres and mariadb: the fixture hangs without the fix and completes with it):

before the fix
(fail) postgres: pool calls from onclose are safe when connecting fails synchronously
  expected "reentry ok", got "reentry threw: TypeError"
(fail) postgres: a throwing onconnect callback does not leave the pool stuck [5000.46ms]
  ^ this test timed out after 5000ms.
(fail) mysql: a throwing onconnect callback does not leave the pool stuck [5000.10ms]
  ^ this test timed out after 5000ms.
(fail) postgres: a throwing onclose callback does not hang sql.end() [5000.06ms]
  ^ this test timed out after 5000ms.
(fail) mysql: a throwing onclose callback does not hang sql.end() [5000.06ms]
  ^ this test timed out after 5000ms.
(fail) postgres: a throwing onclose callback still rejects pending queries on connect failure [5000.06ms]
  ^ this test timed out after 5000ms.
(fail) mysql: a throwing onclose callback still rejects pending queries on connect failure [5000.06ms]
  ^ this test timed out after 5000ms.

With the fix all 7 pass, and the issue's original repro scripts against real postgres and mariadb servers now settle (query result: [{"x":1}], sql.end() resolves) while the callback error still surfaces as an uncaughtException. sql-connect-error-reporting.test.ts, sql-mysql-clean-reentry.test.ts and sql-mysql-cached-error.test.ts still pass.

Note: this predates #31994 (which dedupes these two files into a shared base class); if that lands first, the same change applies once to its shared.ts.

Rebase note (after #32028 landed)

#32028 restructured #onClose: connect failures now retry with backoff and the onclose callback moved into a new #finishClose, which the retry timer also calls. Resolved by putting the try/finally around the hook in #finishClose (covering both callers) and keeping #onClose's retry branch untouched (no user code runs there). #onConnected keeps main's connectStartedAt = 0 reset inside the finally. #32028 also split refused connections into ERR_*_CONNECTION_REFUSED (fail fast, not retried), so the two refused-port tests assert that code now.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e0345616-7148-44f6-a51e-10135b788e95

📥 Commits

Reviewing files that changed from the base of the PR and between 1b86bd4 and 7139c12.

📒 Files selected for processing (1)
  • test/js/sql/sql-onconnect-onclose-throw.test.ts

Walkthrough

MySQL and PostgreSQL pooled connection lifecycle handlers now run user onconnect/onclose inside try/finally (Postgres adds deferred close via process.nextTick) so internal bookkeeping, query notifications, and adapter release always execute; a new test suite validates thrown-hook behaviors across connect, refuse, and setup-failure scenarios.

Changes

Connection Lifecycle Callback Exception Safety

Layer / File(s) Summary
MySQL callback exception safety
src/js/internal/sql/mysql.ts
#onConnected and #finishClose wrap connectionInfo.onconnect/connectionInfo.onclose in try/finally, moving storedError, state updates, readyConnections removal, query notifications, onFinish handling, and adapter.release(...) into finally blocks so bookkeeping executes even if hooks throw.
PostgreSQL callback exception safety and deferred close
src/js/internal/sql/postgres.ts
Adds closeNT helper and defers onClose via process.nextTick on synchronous setup errors. #onConnected and #finishClose run user callbacks inside try with pool bookkeeping, state/flag/query counter resets, waiting-query notifications, and adapter.release in finally.
Test infrastructure and callback exception scenarios
test/js/sql/sql-onconnect-onclose-throw.test.ts
New subprocess-based test harness writes fixtures, runs Bun subprocesses, and asserts thrown onconnect/onclose behavior for container-backed Postgres/MySQL, connection-refused via closed ephemeral port, and a synchronous Postgres password() failure regression. Tests verify uncaughtException output and exitCode === 0.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: preventing throwing onconnect/onclose callbacks from corrupting the connection pool through wrapping them in try/finally blocks.
Description check ✅ Passed The description comprehensively covers the problem, fix, and verification with clear examples and test results, fully addressing the required template sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 5:02 PM PT - Jun 10th, 2026

@robobun, your commit 7139c12 has some failures in Build #61840 (All Failures)


🧪   To try this PR locally:

bunx bun-pr 32041

That installs a local version of the PR into your bun-32041 executable, so you can run:

bun-32041 --bun

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/sql/sql-onconnect-onclose-throw.test.ts`:
- Around line 167-246: Each test is missing an explicit assertion that the
subprocess stderr is empty; update each test that calls runFixture (including
tests using connectAndEnd("postgres"/"mysql"),
connectAndEnd("postgres","onclose"), failToConnect("postgres"/"mysql"), and the
inline fixture) to assert that the returned stderr is "" (e.g.
expect(stderr).toBe("")) and place that assertion before checking exitCode to
follow the guideline that output assertions come before exitCode checks; locate
uses of runFixture and add the stderr assertion accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 51db9e2b-9c5e-4238-99b9-a2a43cf5f3f5

📥 Commits

Reviewing files that changed from the base of the PR and between 88d48c2 and ed66877.

📒 Files selected for processing (3)
  • src/js/internal/sql/mysql.ts
  • src/js/internal/sql/postgres.ts
  • test/js/sql/sql-onconnect-onclose-throw.test.ts

Comment thread test/js/sql/sql-onconnect-onclose-throw.test.ts Outdated
@robobun

robobun commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

The remaining CI red is unrelated to this diff. Every failed lane in builds 61587 and 61603 fails on exactly one test, test/cli/install/bunx.test.ts "should handle package that requires node 24", which runs bun x --bun @angular/cli@latest --help against the live npm registry. Angular's latest release raised its minimum Node version, and its version check now rejects the Node version Bun reports. Reproducible with released Bun, no code from this PR involved:

$ bun x --bun @angular/cli@latest --help
Node.js version v24.3.0 detected.
The Angular CLI requires a minimum Node.js version of v22.22.3 or v24.15.0 or v26.0.0.
$ echo $?
3

The failure set is identical across both builds (different commits of this branch) and across all platforms, and main's last green build (61548, same base commit) predates the Angular release, so this will fail on any branch until the bunx test or the version handling is updated. The failing shards do not include this PR's test file; the new sql tests pass under bun bd test (debug+ASAN) and fail without the fix.

@robobun

robobun commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Update after the main merge (thanks for the bunx skip): build 61612 is green on all 189 completed linux/windows/musl jobs. The one red job is darwin 26 aarch64, which timed out for reasons unrelated to this diff:

  • test/cli/init/init.test.ts snapshot mismatches: bun init on that runner emits an extra CLAUDE.md (machine-dependent agent detection), so the directory-listing snapshots fail there.
  • the job then died building vendor/elysia: Cannot find package 'tsup', which aborted the runner into a timeout.

The sql tests from this PR ran in that same shard and passed 7 for 7. No further pushes from me to avoid spamming CI; happy for a maintainer to re-run the darwin lane or merge over it.

…ection pool

The user-provided onconnect/onclose callbacks ran before the pooled
connection handlers updated their own bookkeeping. If the callback threw,
the handler aborted early: state stayed pending, storedError was never
recorded, pending queries were never notified, onFinish never ran and the
connection was never released, so queries, connect() and end() hung
forever.

Run the callback in a try/finally so the bookkeeping always completes; the
exception still propagates and is reported as an uncaughtException, as
before.

Also defer the synchronous onClose call in postgres createConnection's
catch with process.nextTick, matching mysql. It could fire while the
adapter was still filling this.connections, so pool methods that scan that
array (flush, isConnected, close) threw a TypeError on the holes when
called from inside the onclose callback.

Fixes #32037
@robobun robobun force-pushed the farm/4e74e963/sql-pool-throwing-hooks branch from c845b16 to 1b86bd4 Compare June 10, 2026 22:13
alii
alii previously approved these changes Jun 10, 2026
@alii alii enabled auto-merge (squash) June 10, 2026 22:39
@alii

alii commented Jun 10, 2026

Copy link
Copy Markdown
Member

actually no mock server @robobun
you have a postgres instance in your container. use that. see what the other tests do. a mock server is certainly incorrect

…s/mysql

Replace the mock protocol servers with the docker-compose postgres_plain and
mysql_plain services via describeWithContainer, matching the other sql tests.
The connection-refused tests keep using a real closed port and the
synchronous-failure test never dials, so those still run without docker.
@robobun

robobun commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Done in 7139c12: the established-connection tests (throwing onconnect, throwing onclose on sql.end()) now run against the docker-compose postgres_plain and mysql_plain services via describeWithContainer, same as the other sql tests, and assert the query result comes back despite the throwing hook. Verified the fixtures against real postgres and mariadb locally: they hang without the fix and complete with it. The connection-refused tests keep using a real closed port (as in sql-connect-error-reporting.test.ts) and the synchronous password-failure test never dials, so no mock servers remain.

@robobun

robobun commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

CI note on build 61829: the docker-backed sql tests from this PR ran against the real postgres_plain and mysql_plain compose services and passed 7/7 (debian x64 shard). The only failed job so far is debian x64-baseline on test/js/web/streams/streams-leak.test.ts ("native ReadableStream reuses the pull buffer across small reads", expected >20 chunks, got 9), a wire-coalescing timing assertion unrelated to this diff. Not pushing a retrigger to avoid churning CI.

@alii alii disabled auto-merge June 11, 2026 00:01
@alii alii merged commit 8354af5 into main Jun 11, 2026
73 of 75 checks passed
@alii alii deleted the farm/4e74e963/sql-pool-throwing-hooks branch June 11, 2026 00:01
robobun added a commit that referenced this pull request Jun 12, 2026
The re-integration commit hosts the #32041 try/finally and the #32097
forced-close settlement in one BasePooledConnection close handler, so pin
their interaction per driver: onclose throws while the pool is force-closed
mid-handshake, and close() must still resolve and the pending query must
still reject.
alii added a commit that referenced this pull request Jun 12, 2026
…n in shared.ts (#32145)

## What this does

Final slice from #31994 (closed as too big), on top of #32128, #32135,
#32141. Consolidates the JS adapters' duplicated pool/connection/query
plumbing from `src/js/internal/sql/{postgres,mysql,sqlite}.ts` into
`src/js/internal/sql/shared.ts` as a `BasePooledConnection` class
hierarchy. Net −995 lines across 4 files.

Two commits:

- **Cherry-pick from #31994** (1aa3dbb) — checks out
`src/js/internal/sql/` from `origin/claude/split/sql` as-is. This
intentionally clobbers two pool-lifecycle fixes that landed on `main`
since #31994's merge-base: #32041 (throwing onconnect/onclose corrupting
the pool) and #32097 (forced `close()` resolving mid-handshake).
- **Re-integrate #32041 + #32097 into the new structure** (6e6fd4f) —
re-hosts both fixes in `BasePooledConnection`:
- #32041: try/finally around the user `onconnect`/`onclose` calls in
`handleConnected` and `#finishClose` so a throwing callback never skips
pool bookkeeping; the `createPooledConnectionHandle` catch always defers
via `process.nextTick` (drops the per-driver `deferSyncCloseError` flag
— both drivers now match).
- #32097: `startConnection()` is `abstract Promise<void>` and both
subclasses await + assign `this.connection` at creation;
`#beginConnecting` awaits it and closes the handle if `onFinish` was set
(pool force-closed) in the microtask window before it materialized.

## Behavioral equivalence

The original `BasePooledConnection` extraction in #31994 carried a
per-file behavioral-equivalence audit against its merge-base
(placeholders, escaping, helper commands, error message text,
pool/transaction semantics all preserved) and was green on full CI build
61383. This PR re-applies that diff and adds the two missing fixes; the
load-bearing verification is below.

## Verification

- `bun bd`: builds clean.
- The three lifecycle test files (`sql-onconnect-onclose-throw.test.ts`,
`sql-close-pending-connection.test.ts`,
`sql-connect-error-reporting.test.ts`): 26 pass / 0 fail.
- **New tests in this PR**: `sql-onconnect-onclose-throw.test.ts` gains
a forced-close variant per driver (a throwing `onclose` while the pool
is force-closed mid-handshake, against a fake `net` server). The two
re-hosted fixes meet in `BasePooledConnection`'s close handler, and that
interaction had no coverage. Note these pass against `origin/main`'s
`src/js/internal/sql` too (verified): both underlying fixes already live
on main per-driver, so versus main this PR is equivalence-preserving by
design and no test can fail on one side only. The HEAD~1 stash test
below is the proof that the re-integration commit is load-bearing.
- **Load-bearing stash test**: built and ran the same three files at
HEAD~1 (the cherry-pick before re-integration) — 6 fail (`mysql: forced
close() resolves when called before the native handle is stored`
timeout; `postgres: pool calls from onclose are safe when connecting
fails synchronously` → `reentry threw: TypeError`; both drivers'
`throwing onclose still rejects pending queries on connect refused`
timeouts). With the re-integration: 26 pass. The diff is required for
both fixes.
- Full `test/js/sql/` against live
`postgres_plain`/`mysql_plain`/`*_tls` containers: 1526 pass / 7 fail /
6 errors locally. Every failure reproduces on `origin/main` in the same
environment — see Notes below.

## Notes on local-only test failures (none introduced by this PR)

- `sql-mysql-query-string-leak.test.ts` fails locally on a debug+ASAN
build for both `origin/main` (314.6 MiB) and this branch (317.2 MiB),
within noise of each other; both exceed the 256 MiB ASAN threshold. A
`heapStats()` probe of the same workload shows `MySQLQuery 2→1` after GC
— wrappers are finalized correctly. The test passes on `main`'s CI
(release build); the threshold is tuned for release RSS overhead, not
local debug.
- `sql.test.ts > query string memory leak test` (postgres): same
category.
- `sqlite-sql.test.ts > Query Normalization Fuzzing Tests > handles
exotic but valid SQL patterns`: 5.5s timeout on debug builds —
pre-existing per #31994's original verification notes.
- `describeWithContainer` cold-start races: `docker compose up -d
--wait` returns non-zero on an already-healthy container
(`test/docker/index.ts:157`), causing beforeEach hook timeouts in
`sql-mysql.test.ts` (TLS), `sql-mysql-bind-oob`, `sql-mysql.helpers`,
`sql-mysql.auth`. Pre-existing harness flake; clears on warm rerun.

The Rust-side changes from #32097 (`src/sql_jsc/`) were already on
`main` before this PR — only the JS hooks needed re-integrating.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bun.SQL: a throwing onconnect/onclose callback leaves the connection pool in an inconsistent state

2 participants