Skip to content

sql: grow connection pool lazily instead of opening max on first use#30636

Open
robobun wants to merge 10 commits into
mainfrom
farm/b4c5677a/sql-lazy-pool
Open

sql: grow connection pool lazily instead of opening max on first use#30636
robobun wants to merge 10 commits into
mainfrom
farm/b4c5677a/sql-lazy-pool

Conversation

@robobun

@robobun robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Fixes #30632.

Repro

const sql = new Bun.SQL({
  adapter: "postgres",
  host: "localhost",
  port: 5432,
  username: "bun_sql_test",
  database: "bun_sql_test",
  max: 500,
});

await sql`SELECT 1`;
// `select count(*) from pg_stat_activity` now shows 500 backend connections

One query on a max: 500 pool opens all 500 connections. The user reports
this happens "immediately" — it's actually on the first use, but same net
effect: a pool never scales with load, it jumps straight to the ceiling.

Cause

PostgresAdapter.connect() (and the identical pattern in MySQLAdapter)
eagerly instantiates every pool slot on the first connect() call:

// src/js/internal/sql/postgres.ts — pre-fix
this.poolStarted = true;
const pollSize = this.connections.length; // = max
const firstConnection = new PooledPostgresConnection(this.connectionInfo, this);
this.connections[0] = firstConnection;
// ...
for (let i = 1; i < pollSize; i++) {
  this.connections[i] = new PooledPostgresConnection(this.connectionInfo, this);
}

Each new PooledPostgresConnection(...) kicks off a real TCP connect in
its constructor, so on the first use we burst max live sockets.

Fix

Replace the eager burst with lazy growth:

  • Pool starts empty; #tryGrowPool() opens one connection at a time and
    appends to this.connections, capped at maxPoolSize (= options.max).
  • First connect() opens exactly one connection, not max.
  • flushConcurrentQueries() opens another connection when queued queries
    can't be served by an existing ready/pending slot, so concurrent load
    still scales up to max.
  • maxDistribution divides by the pool ceiling (maxPoolSize) rather
    than the current pool size, so it doesn't collapse every queued query
    onto the one open socket while the others are being dialed.
  • connect()'s retry branch also grows the pool when every existing slot
    is permanently broken, so a one-off auth/network failure doesn't
    consume a slot forever.

Same change applied to src/js/internal/sql/mysql.ts — it carried the
same eager loop.

Verification

With the fix, max: 500 and a single SELECT 1 now opens 1
connection. 100 concurrent queries against a max: 20 pool peak at
20 connections and drain in ~5 rounds.

before: 1 query on max=50 → 50 sockets opened
after:  1 query on max=50 → 1 socket opened
after:  100 concurrent queries on max=20 → 20 sockets, ~762ms

Tests

test/js/sql/sql-pool-lazy-growth.test.ts — one case per adapter
(postgres, mysql) using a bare Bun.listen() TCP sink that counts
opened sockets, no Docker/real DB required:

  • a single query opens exactly 1 socket (not max)

Verified both tests fail under USE_SYSTEM_BUN=1 (receive 50 sockets
instead of 1) and pass with the fix.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. 🚀 Feature Request: Customizable Connection Pool Strategy (eager vs lazy) mysql / postgres #30631 - Feature request for customizable connection pool strategy (eager vs lazy), which this PR directly implements by switching to lazy pool growth
  2. [Bug] MySQL pool size > 1 causes Bun to hang during sql execution #26235 - MySQL pool size > 1 causes Bun to hang during SQL execution; lazy pool growth avoids eagerly opening all connections, which is the condition triggering the hang

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #30631
Fixes #26235

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Both MySQL and Postgres adapters now grow connection pools lazily on-demand up to a new maxPoolSize ceiling. Pool capacity tracking, growth helpers, distribution logic, and connect/flush paths were updated. A test asserts a single query opens exactly one TCP connection for each adapter.

Changes

Lazy Connection Pool Growth

Layer / File(s) Summary
Pool capacity tracking and lazy initialization
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
Both adapters add maxPoolSize and initialize connections as an empty array instead of pre-allocating to capacity.
Pool growth helper methods
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
Introduce private #tryGrowPool() to append new pooled connections when under maxPoolSize and #pendingConnectionsCount() to count in-flight handshakes.
Distribution and flush logic
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
maxDistribution() now targets maxPoolSize and flushConcurrentQueries() grows the pool when unserved waiters remain and pending handshakes are insufficient.
Connection requests when pool is active
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
connect() now attempts #tryGrowPool() when the pool has started but no ready connections exist, enqueuing callers if growth succeeds or failing fast on non-retryable errors.
Pool initialization for first query
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
First-query initialization opens a single connection via #tryGrowPool() (with preReserved when requested) instead of creating a full initial set.
Reserved connection path pool growth
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
Reserved-connection branch now enqueues callers and tries #tryGrowPool() when no ready connection can be assigned immediately.
Lazy pool behavior verification
test/js/sql/sql-pool-lazy-growth.test.ts
Add parameterized tests for postgres and mysql that use a TCP sink to count socket opens and assert a single SELECT 1 triggers exactly one TCP connection; add a Postgres handshake-failure test that ensures subsequent attempts fail fast.

Possibly related issues

  • #30632 — Implements lazy pool growth to prevent pre-allocation of all max connections on SQL adapter initialization; this PR tracks maxPoolSize and grows connections on demand.
  • #30631 — Matches the introduced lazy growth helpers and maxPoolSize changes for MySQL/Postgres adapters; functionally related.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: converting connection pool initialization from eager (opening max connections immediately) to lazy (opening connections on-demand).
Description check ✅ Passed The description includes both required template sections (What does this PR do and How did you verify): it explains the fix, root cause, verification steps, and test approach comprehensively.
Linked Issues check ✅ Passed The PR fully addresses issue #30632: it implements lazy pool growth starting from empty connections instead of pre-allocating max slots, preventing immediate connection burst and enabling dynamic scaling based on load.
Out of Scope Changes check ✅ Passed All changes directly support lazy pool growth: MySQL and Postgres adapters add maxPoolSize and tryGrowPool(), flushConcurrentQueries() updated to trigger growth, connect() refactored for lazy initialization, and tests verify the fix without unrelated alterations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/sql/sql-mysql.test.ts`:
- Around line 17-76: Extract the two tests beginning with describe("mysql
connection pool grows lazily (`#30632`)") from test/js/sql/sql-mysql.test.ts into
a new test file named something like test/js/sql/sql-mysql-nondocker.test.ts (or
sql-mysql-mock.test.ts); remove that describe block from the original file so
Docker-gated suite remains clean; keep the tests' exact bodies and using/server
setup and preserve imports (e.g., SQL, Bun.listen) and async semantics, and
ensure the new file is picked up by the test runner (naming convention) so the
non-Docker/mock MySQL regression suite runs separately from sql-mysql.test.ts.

In `@test/js/sql/sql.test.ts`:
- Around line 112-114: The test currently uses a fixed setTimeout to "give
connectors a chance" which is flaky; replace the hardcoded wait with a
deterministic wait that polls the condition until openedSockets === 0 (or times
out); update the snippet around openedSockets so the test awaits a
small-interval retry loop or a test helper (e.g., waitFor) that repeatedly
checks openedSockets and resolves when it becomes 0, failing the test if a
configurable timeout elapses—target the openedSockets variable and the test
block where the setTimeout is used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 652c6181-90bd-4cf5-ad8e-ed0d16dd92da

📥 Commits

Reviewing files that changed from the base of the PR and between b9c757b and 4af886b.

📒 Files selected for processing (4)
  • src/js/internal/sql/mysql.ts
  • src/js/internal/sql/postgres.ts
  • test/js/sql/sql-mysql.test.ts
  • test/js/sql/sql.test.ts

Comment thread test/js/sql/sql-mysql.test.ts Outdated
Comment thread test/js/sql/sql.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/sql/sql-pool-lazy-growth.test.ts`:
- Around line 45-46: The test currently iterates adapters with a for-loop and
calls describe(`${adapter} connection pool grows lazily (`#30632`)`, ...) which
violates the guideline to use parameterized suites; replace the manual loop with
describe.each([...]) so the suite is declared via
describe.each(["postgres","mysql"]) and the suite title uses the adapter
parameter (e.g., describe.each(...)(`%s connection pool grows lazily (`#30632`)`,
(adapter) => { ... }); update the existing describe block and move its inner
tests unchanged into the new describe.each callback so test names and behavior
remain the same.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24709e75-36f3-45a6-8926-28306b0c3ee4

📥 Commits

Reviewing files that changed from the base of the PR and between 4af886b and 5169650.

📒 Files selected for processing (1)
  • test/js/sql/sql-pool-lazy-growth.test.ts

Comment thread test/js/sql/sql-pool-lazy-growth.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/js/sql/sql.test.ts:87-115 — This test never issues a query, so on pre-fix Bun the eager burst (which lived inside connect(), not the constructor) is never triggered — openedSockets is 0 on system Bun too. That means the PR description's claim that "all 5 tests fail under USE_SYSTEM_BUN=1" is inaccurate for this case, and per CLAUDE.md a test that passes with USE_SYSTEM_BUN=1 is not a valid regression test. Consider dropping it (the other 4 tests already cover the fix) or correcting the PR description to say 4 of 5; also note CLAUDE.md says not to use setTimeout in tests.

    Extended reasoning...

    What the issue is

    The third Postgres test, "constructing Bun.SQL without running a query opens no connections", asserts that constructing a Bun.SQL instance with max: 10 and never issuing a query results in openedSockets === 0. The PR description states "Verified all 5 tests fail under USE_SYSTEM_BUN=1 (receive 50 sockets instead of 1)", but this particular test cannot fail on the unpatched runtime: it has max: 10 (not 50), it never runs a query, and the pre-fix code never opened sockets at construction time. Root CLAUDE.md line 130 says: "CRITICAL: Verify your test fails with USE_SYSTEM_BUN=1 bun test <file> and passes with bun bd test <file>. Your test is NOT VALID if it passes with USE_SYSTEM_BUN=1." This test does not meet that bar.

    Why pre-fix Bun also opens 0 sockets here

    Before this PR, the PostgresAdapter constructor only did:

    this.connections = new Array(connectionInfo.max);

    That allocates a sparse array of holes — no PooledPostgresConnection instances are constructed, so no TCP dials happen. The eager burst lived inside connect(), behind if (!this.poolStarted):

    this.poolStarted = true;
    const pollSize = this.connections.length;
    const firstConnection = new PooledPostgresConnection(this.connectionInfo, this);
    this.connections[0] = firstConnection;
    for (let i = 1; i < pollSize; i++) {
      this.connections[i] = new PooledPostgresConnection(this.connectionInfo, this);
    }

    connect() is only invoked when a query is run. The PR description itself acknowledges this: "The user reports this happens 'immediately' — it's actually on the first use". Since this test never issues a query, connect() is never called and poolStarted stays false.

    Step-by-step on system Bun

    1. new SQL({ adapter: "postgres", max: 10, ... })PostgresAdapter constructor runs, sets this.connections = new Array(10) (10 empty holes), poolStarted = false. No PooledPostgresConnection is created, so no socket is opened. openedSockets = 0.
    2. await new Promise(r => setTimeout(r, 50)) → nothing happens; there is no pending dial to race. openedSockets is still 0.
    3. expect(openedSockets).toBe(0)passes.
    4. await using disposal calls close()#close(). poolStarted is false, so the connection-iteration loop is skipped; readyConnections.clear() and waitingQueue.length = 0 are no-ops on empty containers.

    So under USE_SYSTEM_BUN=1 this test passes, contradicting the PR description's "all 5 tests fail" claim and violating the CLAUDE.md requirement.

    Why this is only a nit

    This is a test-methodology / PR-description-accuracy issue, not a runtime defect. The other four tests (single query → 1 socket, 20 concurrent on max: 5 → 5 sockets, ×2 for MySQL) genuinely fail on system Bun and pass with the fix, so the regression is properly covered. One could argue the spirit of the CLAUDE.md rule is "the PR must include at least one test that demonstrably catches the regression", which is satisfied — and this third test is a reasonable forward-looking guard against a future regression where the constructor itself becomes eager. But taken literally, the repo's own contributing guide says a test that passes under USE_SYSTEM_BUN=1 "is NOT VALID", and the PR body's verification claim is factually wrong for this case.

    Separately, the same test uses await new Promise(r => setTimeout(r, 50)), which CLAUDE.md line 129 also flags: "Do not use setTimeout in tests."

    Suggested fix

    Either:

    • Drop this test (tests 1, 2, 4, 5 already cover the fix and satisfy the USE_SYSTEM_BUN=1 rule), or
    • Keep it as a guard but update the PR description from "all 5 tests fail" to "tests 1, 2, 4, 5 fail under USE_SYSTEM_BUN=1; test 3 is a guard for already-correct constructor behavior", and replace the setTimeout(50) with something that doesn't rely on wall-clock time (e.g. await Bun.sleep(0) / a microtask drain, or just assert immediately since construction is synchronous).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/js/sql/sql-pool-lazy-growth.test.ts (1)

35-36: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace manual adapter loop with describe.each()

Use parameterized suite declaration instead of the for loop to match the repo’s test style rules.

♻️ Proposed change
-for (const adapter of ["postgres", "mysql"] as Adapter[]) {
-  describe(`${adapter} connection pool grows lazily (`#30632`)`, () => {
+describe.each(["postgres", "mysql"] as Adapter[])("%s connection pool grows lazily (`#30632`)", adapter => {
     test("a single query only opens one TCP connection, not `max`", async () => {
       using sink = makeSink();
       await using sql = new SQL({
@@
       await sql`SELECT 1`.catch(() => {});
       expect(sink.opened).toBe(1);
     });
-  });
-}
+  });
+});

As per coding guidelines, test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} should "Use describe.each() for parameterized tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/js/sql/sql-pool-lazy-growth.test.ts` around lines 35 - 36, Replace the
manual for-loop creating per-adapter suites with a parameterized suite using
describe.each([...]) to match repo test style: convert the `for (const adapter
of ["postgres", "mysql"] as Adapter[]) { describe(\`${adapter} connection pool
grows lazily (`#30632`)\`, () => { ... }) }` pattern into `describe.each([
"postgres", "mysql" ])("... %s ...", (adapter) => { ... })`, ensuring you pass
the adapter into the inner tests (references: the outer describe block that
currently interpolates `${adapter}` and any uses of the `adapter` variable
inside tests) so behavior and test names remain identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@test/js/sql/sql-pool-lazy-growth.test.ts`:
- Around line 35-36: Replace the manual for-loop creating per-adapter suites
with a parameterized suite using describe.each([...]) to match repo test style:
convert the `for (const adapter of ["postgres", "mysql"] as Adapter[]) {
describe(\`${adapter} connection pool grows lazily (`#30632`)\`, () => { ... }) }`
pattern into `describe.each([ "postgres", "mysql" ])("... %s ...", (adapter) =>
{ ... })`, ensuring you pass the adapter into the inner tests (references: the
outer describe block that currently interpolates `${adapter}` and any uses of
the `adapter` variable inside tests) so behavior and test names remain
identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46c2416a-0c8d-4721-8047-ad487409fc60

📥 Commits

Reviewing files that changed from the base of the PR and between 5169650 and 3308293.

📒 Files selected for processing (1)
  • test/js/sql/sql-pool-lazy-growth.test.ts

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

CI red on #54075, but none of the failures touch this diff — the SQL pool change only touches src/js/internal/sql/{postgres,mysql}.ts. What failed on x64-asan:

  • test/js/web/fetch/fetch-http2-client.test.ts — JSC panic ThreadLock is locked by thread 6422, not thread 6433 in an HTTP/2 POST+ReadableStream test (unrelated fetch code path)
  • Flaky-retry warnings for fetch.tls, update_interactive_install, and hot/hot.test.ts — all in fetch/CLI, not SQL

Already spent my one ci: retrigger on the earlier gate-infra flake (Permission denied during release rebuild), so this needs a maintainer retry.

Comment thread src/js/internal/sql/postgres.ts Outdated
@RiskyMH

RiskyMH commented May 13, 2026

Copy link
Copy Markdown
Collaborator

@robobun now i wonder is sqlite truly safe from this :)

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

@RiskyMH sqlite's safe — no pool. SQLiteAdapter has a single db handle (one new Database(filename) in the constructor, no max option, no pool slots to grow eagerly or otherwise).

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Skip pool growth when non-retryable auth error is set

@RiskyMH

RiskyMH commented May 13, 2026

Copy link
Copy Markdown
Collaborator

@robobun respond to @Lillious's review plz

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Skip pool growth when non-retryable auth error is set

@claude Tell @robobun to make these changes

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Done in d0ade91 — the non-retryable branch now skips #tryGrowPool() unless password is supplied as a function. Added a regression test using a minimal fake server that answers StartupMessage with AuthenticationRequest code 9 (SSPI) — Bun rejects as ERR_POSTGRES_UNSUPPORTED_AUTHENTICATION_METHOD. 5 sequential failed queries now open 1 socket (was 5).

@Lillious

Copy link
Copy Markdown
Contributor

Done in d0ade91 — the non-retryable branch now skips #tryGrowPool() unless password is supplied as a function. Added a regression test using a minimal fake server that answers StartupMessage with AuthenticationRequest code 9 (SSPI) — Bun rejects as ERR_POSTGRES_UNSUPPORTED_AUTHENTICATION_METHOD. 5 sequential failed queries now open 1 socket (was 5).

@robobun What if the password is supplied as a template literal?

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

A template literal is just a plain string by the time it hits the constructor — ${token} runs the interpolation before new SQL() ever sees the object. Bun can't re-evaluate it; it's identical to password: "foo".

The dynamic-creds escape hatch is specifically for the password: () => fetchFreshToken() shape — a function Bun re-invokes every time it opens a new socket (PooledPostgresConnection.createConnection does password = password() on each attempt). That's the only case where a retry can legitimately pick up a new secret.

If you want to refresh credentials per retry, wrap it:

new Bun.SQL({ password: async () => (await vault.fetch("db-pw")).value })

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Could there be an edge case scenario where database password is updated and the static password used to connect becomes stale and then the database password is reverted? How can we handle this the with fail fast implementation?

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Real concern — let me walk through the lifecycle.

Fail-fast only kicks in when every current pool slot has a cached non-retryable auth error. The pool doesn't know the server reverted the password, so cached errors keep getting returned until something clears them. Three things clear them today:

  1. .end() + new new SQL(...) — the app observes the auth errors and rebuilds the pool. Standard pattern.
  2. maxLifetime — if set, previously-connected slots close on expiry. Doesn't help for slots that never got past auth (canBeConnected was never set).
  3. password: () => latestPassword — the function shape is re-invoked per TCP attempt, so a fresh retry picks up whatever the current value is. This is the documented escape hatch for any rotating secret (IAM, Vault, or your rotate/revert scenario).

For the specific "static string, rotated, then reverted" flow: pre-fix Bun had the same end-state — it returned the cached auth error too — it just burned an extra TCP+auth RTT per query doing so. The reporter-visible difference is zero; the fix just stops the wasted handshakes.

So the recommendation for anyone whose password can change under them is the function shape. If there's demand for a sql.reset() / explicit "forget cached errors and retry" API, that's worth a separate issue — but it's additive to this fix, not a blocker.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Please give me credit in the release notes when this releases.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Release notes are written by the maintainers at release time — I don't control them. You're credited as the reporter in #30632 and the PR body links back, so it's there for whoever drafts the notes.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Second CI run: same story as before — no failures touch this diff. Failed tests are all in fetch/hot-reload/jsc-stress/http2/node-http lanes:

  • fetch.tls — TimeoutError/UnreachableError mismatch
  • jsonwebtoken/async_sign — 90s timeout
  • hot.test.ts — sourcemap reload (was flaky on the previous CI too)
  • jsc-stressmprotect failed: 487 on Windows x64-baseline (OS-level)
  • fetch-http2-client — HTTP/2 ReadableStream test (same failure as previous CI)
  • 2x parallel node:http timeout on Windows 2019

Already used my one ci: retrigger earlier — needs a maintainer to re-run CI. Diff is green.

Comment thread src/js/internal/sql/postgres.ts Outdated
@Lillious

Copy link
Copy Markdown
Contributor

@robobun Review the recent comment made by @claude

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

@Lillious reviewed — valid finding. Fixed in bad3076. The growth heuristic was counting all pending connections as capacity for waitingQueue, but pending connections opened for reserve() callers get claimed by reservedQueue first (via release()) and never feed the waiting queue. Now subtracting up to reservedQueue.length from pending before comparing. Mixed reserve() + plain-query workloads no longer under-provision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/js/internal/sql/mysql.ts`:
- Around line 948-967: The code treats auth failures as non-retryable even when
connectionInfo.password is a function, and currently only calls `#tryGrowPool`()
which returns null at maxPoolSize, causing permanent failure; change the
fallback so that when canDynamicallyAuth (connectionInfo.password is a function)
is true and `#tryGrowPool`() returns null you reuse one of the closed slots
instead of immediately failing — e.g. take a closed slot (the same resource pool
mechanism used when a slot is released), push onConnected into the appropriate
queue (reservedQueue or waitingQueue) and trigger the code path that
opens/initializes a connection for that closed slot (the same path used by
connect()/the slot-reuse logic) so a fresh auth attempt with the dynamic
password is made; keep using storedError/connectionClosedError only when no
dynamic auth is available and no slot can be reused.

In `@src/js/internal/sql/postgres.ts`:
- Around line 1176-1195: The dynamic-credential recovery fails because we only
call `#tryGrowPool`() and add a new slot after checking capacity, which returns
null when connections.length === maxPoolSize and causes callers to replay a
cached auth error; also createConnection() can synchronously call onClose() so
release() may run before we enqueue the waiter. Fix by enqueuing the waiter
(push into reservedQueue or waitingQueue) before attempting any retry/grow, and
when connectionInfo.password is a function attempt to reopen an existing closed
slot (call into the same code path used by `#tryGrowPool`() but targeting a closed
connection) rather than only creating a new slot; ensure the reopen path handles
synchronous onClose() by registering onClose/onConnected callbacks before
invoking PooledPostgresConnection.createConnection() so release()/onClose()
cannot race the enqueue.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c3fe454-2b26-422b-932a-51e06bdb9166

📥 Commits

Reviewing files that changed from the base of the PR and between 9b8adfa and bad3076.

📒 Files selected for processing (2)
  • src/js/internal/sql/mysql.ts
  • src/js/internal/sql/postgres.ts

Comment thread src/js/internal/sql/mysql.ts Outdated
Comment thread src/js/internal/sql/postgres.ts Outdated
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Third CI run, same pattern — no failures touch the SQL diff:

  • bun-upgradeETXTBSY when posix_spawn'ing a bun binary that's still being copied (fs race)
  • fetch-http2-client — WebKit AtomStringImpl::remove: wasRemoved assertion (WTF/Source WebKit bug, unrelated to anything JS-side; same HTTP/2 file that's been flaky on every run)
  • 3x test-http-should-emit-close-when-connection-is-aborted timeout — Windows 2019 / Windows aarch64 / x64

Already used my one ci: retrigger on the gate-infra flake earlier. Needs a maintainer to re-run CI. The SQL diff is green — sql-pool-lazy-growth.test.ts passes locally on debug (ASAN) and hasn't been flagged once.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Please review coderabbit reported issues

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed both findings in 7a274c3:

1. max: 1 + function password couldn'''t recover. Once the pool hit max, #tryGrowPool() returned null and subsequent queries replayed the cached auth error. Now: after a failed grow we walk existing closed slots and call a new forceRetry() helper that bypasses retry()'''s auth-code gate, forcing a fresh handshake — which re-invokes password() and can succeed if the token rotated.

2. Synchronous onClose() race. Postgres' createConnection calls onClose synchronously when password() throws, so the old code's connection.retry() could drain the queue BEFORE the waiter-push landed. The queue push now happens at the top of the poolStarted branch, before any call that can trigger sync onClose. If we end up in the fast-fail path (static creds, no retry possible), we splice the waiter back out before calling onConnected(storedError) directly.

Two new tests:

  • function password retries auth on each new query (rotatable credentials)max: 1 + password: () => ... now opens a fresh socket per query (3 attempts → 3 sockets); was stuck at 1.
  • synchronous password() throw does not hang subsequent queries — 2 sequential queries with password: () => { throw ... } both reject with the thrown error in ~30ms; was timing out at 5s.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Fourth CI run, nothing new — still all unrelated to the SQL diff:

  • bun-install-registry — text lockfile hoisting snapshot mismatch on darwin aarch64 (install/lockfile logic)
  • jsc-stressmprotect failed: 487 on Windows 2019 x64 (OS-level)
  • fetch-http2-client — WebKit WTF AtomStringImpl::remove: wasRemoved ASAN assertion + HTTP/2 Content-Length rejection (WebKit/HTTP2, same bucket as every previous run)

Diff is src/js/internal/sql/{postgres,mysql}.ts + one test file — completely orthogonal to any of those lanes. Retrigger budget already spent. Needs a maintainer to re-run.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/js/sql/sql-pool-lazy-growth.test.ts:176-178 — This test passes 5000 as the third argument to test(), but test/CLAUDE.md says "CRITICAL: Do not set a timeout on tests. Bun already has timeouts." Drop the explicit 5000 and rely on Bun's default per-test timeout — the hang you're guarding against will still fail the test.

    Extended reasoning...

    What the guideline says

    test/CLAUDE.md (line ~120) is explicit:

    No timeouts

    CRITICAL: Do not set a timeout on tests. Bun already has timeouts.

    The repo convention is that the test runner already supplies a per-test timeout, so individual tests should never pass the third timeout argument to test().

    Where the violation is

    test/js/sql/sql-pool-lazy-growth.test.ts:145–178:

    test(
      "synchronous \`password()\` throw does not hang subsequent queries",
      async () => {
        // …
      },
      5000,   // ← explicit per-test timeout
    );

    The in-test comment even says "The bun:test per-test timeout fails the test if anything hangs, which is the failure mode we're guarding" — that rationale applies equally to Bun's default timeout. Hardcoding 5000 doesn't add coverage, it just diverges from the repo convention (and is shorter than the default, so it can only make the test more likely to flake under load, not less).

    Step-by-step

    1. bun:test parses the third positional arg to test() as a millisecond timeout override.
    2. Bun's runner already enforces a default timeout on every test (this is what the CLAUDE.md rule is referring to).
    3. If the regression this test guards against ever recurs (waiter enqueued after release() drains the queue → query promise never settles), the await sql\SELECT ${i}`` on the second iteration never resolves.
    4. With or without the explicit 5000, the runner's timeout fires and the test fails — the explicit value buys nothing functionally.

    Why nothing else justifies it

    None of the other tests in this file set a timeout, and there's nothing about this case (no Docker, no real DB, the failing path resolves synchronously via process.nextTick-equivalent paths) that would warrant an exception to the rule.

    Fix

    Remove the trailing 5000 argument:

    test("synchronous \`password()\` throw does not hang subsequent queries", async () => {
      // …
    });

    This is a style/convention nit, not a functional bug.

Comment on lines +916 to +921
// We already started the pool. Enqueue the waiter BEFORE any call
// that could invoke `onClose` synchronously (e.g. `retry()` →
// `#startConnection` → `createConnection` if `password()` throws),
// otherwise `release()` would drain the queue before we got into
// it and the waiter would hang.
const queue = reserved ? this.reservedQueue : this.waitingQueue;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 nit: This comment was copy-pasted from postgres.ts, but the parenthetical example doesn't apply here — PooledMySQLConnection.createConnection's catch block defers via process.nextTick(closeNT, onClose, e), so a thrown password() never invokes onClose synchronously (unlike postgres.ts which calls onClose(e) directly). The enqueue-before-retry ordering is still fine for symmetry, just consider dropping or rewording the "e.g." so it doesn't misstate this adapter's behavior.

Extended reasoning...

What the comment claims vs. what the code does

The comment at mysql.ts:916–921 reads:

Enqueue the waiter BEFORE any call that could invoke onClose synchronously (e.g. retry()#startConnectioncreateConnection if password() throws), otherwise release() would drain the queue before we got into it and the waiter would hang.

This is verbatim from postgres.ts, where it is accurate: PooledPostgresConnection.createConnection's catch block does onClose(e as Error) directly, and since createConnection is an async function, code before the first await runs synchronously — so a synchronous throw from password() really does invoke #onCloserelease() on the same tick as connect().

In mysql.ts, however, PooledMySQLConnection.createConnection's catch block does process.nextTick(closeNT, onClose, e). The onClose callback is therefore always deferred to the next tick and can never fire synchronously, regardless of whether password() throws.

Step-by-step proof

  1. MySQLAdapter.connect() is called with poolStarted = true, all slots closed, password is a function that throws synchronously.
  2. The waiter is pushed onto queue.
  3. The loop calls connection.retry()#doRetry()#startConnection()PooledMySQLConnection.createConnection(...).
  4. Inside createConnection, password() throws synchronously. The catch block executes process.nextTick(closeNT, onClose, e) and returns null.
  5. Control returns to connect(), which finishes the loop and the rest of the function body. The waiter is already enqueued at this point regardless of ordering.
  6. On the next tick, closeNT invokes onClose(e)#onCloserelease(), which drains the queue and rejects the waiter.

At no point does onClose run between the start of connect() and its return — the cited hazard ("release() would drain the queue before we got into it") cannot occur in the MySQL adapter for this code path.

Addressing the refutation

One verifier argued this is too minor to flag because (a) the code is correct, (b) the adapters are deliberately kept symmetric, and (c) the ordering is still the right defensive choice. All three points are conceded — this is purely a comment-accuracy nit, not a request to change the code ordering. The reason it's still worth a one-line mention is that the comment makes a specific factual claim about this file's behavior ("could invoke onClose synchronously … if password() throws") that a future maintainer debugging the MySQL path might rely on and be misled by — e.g., concluding that process.nextTick deferral is unsafe to remove, or hunting for a synchronous-reentrancy bug that can't exist here. Comments that misstate invariants tend to cost more time than they save.

Impact and fix

Zero runtime impact; the code is correct as written. The fix is trivial: either drop the parenthetical example in the MySQL copy, or replace it with something neutral like "keep the ordering symmetric with postgres.ts where onClose can fire synchronously". Not blocking.

Comment thread test/js/sql/sql-pool-lazy-growth.test.ts Outdated
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Fifth CI run, still the same cast of unrelated flakes — none touch the SQL diff:

  • fetch-abort-slow-connect — ASAN timing sensitivity (36ms observed vs <10ms expected)
  • bun-install-registry — text lockfile hoisting snapshot
  • fetch-http2-client — WebKit WTF AtomStringImpl::remove: wasRemoved ASAN assertion (every run)
  • 3x test-http-should-emit-close-when-connection-is-aborted — Windows 2019 timeouts

Diff is src/js/internal/sql/{postgres,mysql}.ts + one test file. Retrigger budget spent. Needs a maintainer to re-run CI.

robobun and others added 10 commits May 14, 2026 17:30
Previously `new Bun.SQL({ max: N })` opened all N backend connections on
the first query. A single `SELECT 1` on a `max: 500` pool would burst
500 real TCP connections to the server. Both PostgreSQL and MySQL
adapters had this pattern in `connect()`:

  for (let i = 1; i < max; i++) {
    this.connections[i] = new Pooled*Connection(...);
  }

Each `new Pooled*Connection` kicks off a real TCP connect in its
constructor.

Fix: replace the eager burst with lazy growth — the adapter only opens
one connection up-front and `flushConcurrentQueries()` opens more on
demand (capped at `max`) as concurrent waiters show up. `maxDistribution`
now divides by the pool ceiling rather than the current size, so it
doesn't collapse the queue onto the first socket while more are being
dialed.

`connect()`'s retry branch also grows the pool when every existing
slot is permanently broken, so a one-off auth/network failure doesn't
consume a slot forever.

Tests use a bare TCP listener to count the opened sockets, so they run
without Docker or a real database.
…Timeout

- Extract the non-Docker lazy-pool regression tests out of sql.test.ts
  and sql-mysql.test.ts into a dedicated file. This matches the
  convention of the existing `sql-mysql-*.test.ts` / `postgres-*.test.ts`
  files — Docker-gated suites stay container-scoped, stand-alone
  regression tests live in their own file.
- Both adapters now share the same test body via a `for (adapter of ...)`
  loop, so postgres + mysql get identical coverage.
- Replace the `setTimeout(50)` 'give connector a chance' with a
  deterministic `await Bun.sleep(0)` loop — we're testing that no
  socket opens, not waiting for wall-clock time.
@claude[bot] pointed out that 3 of the previous 5 tests passed on
system Bun as well as with the fix, violating CLAUDE.md's
`USE_SYSTEM_BUN=1`-must-fail rule. Only the per-adapter 'single query
opens 1 socket' case actually exercises the bug (the baseline runtime
opens `max` sockets for a single query, the fix opens 1).

The 20-concurrent-queries-caps-at-5 case passed both ways because
baseline Bun happens to open exactly `max` sockets on first use. The
construct-without-query case passed both ways because the original
eager burst lived inside `connect()`, not the constructor.
Followup to the lazy-pool fix from @claude-bot's review (and requested by
@Lillious). The `else if (!retry_in_progress)` branch in `connect()` is
only reachable when every existing slot is closed AND `retry()` refused
all of them — that's exclusively the non-retryable auth class of errors
(bad password, unknown/unsupported auth method, TLS refused, invalid
server signature/key). Opening another connection there with the same
`connectionInfo` hits the identical failure; previously each incoming
query would grow the pool by one and pay a fresh TCP+auth round-trip
before the user saw the cached `storedError`, up to `max` times.

Skip the grow in that branch unless `password` is a function — that's
the one case where a fresh attempt can genuinely pick up a new secret
(rotated IAM token, Vault lease, etc.).

Test `postgres pool fast-fails on non-retryable auth errors` uses a
minimal fake server that answers StartupMessage with
AuthenticationRequest code 9 (SSPI) — Bun rejects that as
`ERR_POSTGRES_UNSUPPORTED_AUTHENTICATION_METHOD` — and asserts 5
sequential failed queries open only 1 socket (was 5).
Followup to the lazy-pool review from @claude-bot. The
`flushConcurrentQueries()` growth heuristic compares
`waitingQueue.length` against the number of still-handshaking
connections, assuming every pending connection will feed the waiting
queue when it finishes. That's not true for pending connections opened
on behalf of `reserve()` callers: `release()` drains
`reservedQueue` first and returns early, so those sockets never reach
`waitingQueue`.

In a mixed workload (e.g. a busy preReserved connection + 5 pending
`reserve()` slots + 3 plain queries), the old math was
`unservedWaiters = 3 - 5 = -2` and we skipped growth even though none
of those 5 pending sockets would ever serve the 3 plain queries. The
queries then idled behind the one busy connection until somebody
drained back into `readyConnections`.

Fix: subtract up to `reservedQueue.length` from the pending count
before comparing against `waitingQueue.length` — those pending
sockets are already spoken for and shouldn't offset the plain-query
backlog. Same change in both the Postgres and MySQL adapters.
Addresses @claude-bot / @Lillious review of the fast-fail branch:

1. Dynamic-credential recovery was gated on `#tryGrowPool()` returning a
   new slot. Once `connections.length === maxPoolSize` (including the
   common `max: 1` case after one auth failure), grow returned null and
   callers kept replaying the cached auth error even when
   `password: () => newToken()` would now hand back something that
   works. Fix: after a failed grow, walk the existing closed slots and
   call a new `forceRetry()` helper that bypasses `retry()`'s
   auth-code gate and forces a fresh handshake — which re-invokes
   `password()` and can legitimately succeed.

2. Postgres' `createConnection` invokes `onClose` SYNCHRONOUSLY when
   `password()` throws (mysql defers via `process.nextTick`, but
   postgres doesn't). The retry loop used to call `connection.retry()`
   first and push the waiter AFTER — so `release()` could drain the
   queue on the same tick before our push landed, leaving the waiter
   stuck. Fix: enqueue unconditionally at the top of the poolStarted
   branch. If the fast-fail path ends up being correct (static creds,
   all slots non-retryable), splice the waiter back out before
   synchronously calling `onConnected(storedError)`.

Tests:
- `function password retries auth on each new query (rotatable
  credentials)`: `max: 1` + `password: () => ...` with an auth-failing
  fake server now opens a fresh socket per query (3 attempts → 3
  sockets); pre-fix it stuck at 1.
- `synchronous password() throw does not hang subsequent queries`:
  2 sequential queries with `password: () => { throw ... }` both
  reject with the thrown error in under 5s (per-test timeout);
  pre-fix the second query hung forever.
@robobun robobun force-pushed the farm/b4c5677a/sql-lazy-pool branch from f32efc8 to 8fd13e5 Compare May 14, 2026 17:30
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Sixth CI run — still all unrelated to the SQL diff:

Diff is still just src/js/internal/sql/{postgres,mysql}.ts + the one test file. Retrigger budget spent. Needs a maintainer to re-run.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun This branch is now out of sync with main. Bun was ported to Rust. Merge main into this branch and resolve merge conflicts and re-write in rust.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Please rebase this for the new rust version of bun

1 similar comment
@RiskyMH

RiskyMH commented May 16, 2026

Copy link
Copy Markdown
Collaborator

@robobun Please rebase this for the new rust version of bun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: SQL Implementation creating un-used sql connections

4 participants