neondatabase · chase06harmon · Apr 13, 2026 · Apr 14, 2026 · Apr 26, 2026 · Apr 26, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -12,6 +12,7 @@ members = [
     "pageserver/pagebench",
     "pageserver/page_api",
     "proxy",
+    "proxy-cplane-api",
     "safekeeper",
     "safekeeper/client",
     "storage_broker",

diff --git a/docs/review/00-overview.md b/docs/review/00-overview.md
@@ -0,0 +1,87 @@
+# Review pack — `mel/pool-fixes-and-benchmarks`
+
+You're reviewing five commits sitting on top of Charles's session-pool
+WIP commit (`6e871ba05`). Three pieces of work, in dependency order:
+
+1. **Fix the session-pool half-close bug** so reused connections actually
+   work (`6a361a996`).
+2. **Multi-endpoint routing** so the mock control plane can return
+   different compute addresses for different endpoint IDs
+   (`fa654b133`).
+3. **Benchmark sweep** comparing direct / proxy+pool / proxy without pool,
+   plus FINDINGS.md (`9ce39d144`).
+4. **Transaction-mode pool multiplexing** — release on every
+   `ReadyForQuery 'I'`, re-acquire from the pool for the next
+   transaction (`29f0bc6d2`).
+5. **Benchmark update** adding a `proxy_txn` configuration and section 5
+   in FINDINGS.md (`ef3db262b`).
+
+## Reading order
+
+1. **`01-session-pool-fix.md`** — start here. Smallest commit, cleanest
+   bug story. The fix is ~30 lines but the chain of reasoning that got
+   there is half the value.
+2. **`02-multi-endpoint.md`** — quick read; mostly CLI plumbing and a
+   lookup-priority decision in `mock.rs`.
+3. **`03-benchmarks.md`** — methodology, findings, and the things the
+   numbers actually say (vs. what they superficially look like). Read
+   `benchmarks/FINDINGS.md` alongside.
+4. **`04-transaction-mode.md`** — the bigger architectural change. Read
+   the `ReadyForQueryWatcher` and `proxy_pass_transaction_mode`
+   sections; skim the rest. Compare against `pgcat/src/client.rs`'s
+   two-loop structure if you want the canonical reference.
+5. **`05-open-questions.md`** — the weak points and decisions I'd
+   challenge if I were reviewing me. Read this last — it'll change
+   how you read everything else.
+
+## High-level commit map
+
+```
+ef3db262b benchmarks: add proxy_txn config and update FINDINGS
+29f0bc6d2 transaction-mode pool multiplexing
+9ce39d144 benchmarks: pool/no-pool/direct sweep with FINDINGS
+fa654b133 multi-endpoint routing via --compute-endpoint-map
+6a361a996 session pooling works: don't poll_shutdown(compute) on client Terminate
+6e871ba05 basic bb8 pooling   <- Charles's last commit
+```
+
+`git log --stat --reverse 6e871ba05..HEAD` gives the per-commit diff
+sizes. Five commits, ~1200 lines added, ~50 removed.
+
+## What's in scope vs out of scope
+
+| In scope                                       | Out of scope                       |
+| ---------------------------------------------- | ---------------------------------- |
+| Session-pool correctness (the half-close bug)  | Pool eviction / health checks      |
+| Per-endpoint routing in the mock control plane | Real cplane integration            |
+| Transaction-mode multiplexing                  | `SET` / `LISTEN` detection         |
+| pgbench sweep + FINDINGS                       | Cancellation across multiplexed conns |
+| Memory measurement at peak                     | Auth caching                       |
+
+The five "out of scope" items are all real follow-ups; some are noted in
+`05-open-questions.md`.
+
+## How to run things locally
+
+Already documented in `benchmarks/README.md`. The shortest reproducer:
+
+```bash
+# Build
+cargo build --features testing --bin proxy
+
+# Smoke session pool
+RUST_LOG="proxy::tcp_pool=info" PGPASSWORD=testpw target/debug/proxy \
+    --auth-backend=postgres \
+    --auth-endpoint='postgresql://proxytest@localhost:5432/proxytest_db' \
+    --compute-endpoint='postgresql://localhost:5433/proxytest_db?sslmode=disable' \
+    --proxy=127.0.0.1:4432 --mgmt=127.0.0.1:7000 --http=127.0.0.1:7001 --wss=127.0.0.1:7002 \
+    --tcp-pool-enabled=true --tcp-pool-max-conns-per-key=5 --tcp-pool-fallback-direct-connect=true
+
+# Run 5 sessions and check the pid is stable (session pooling works)
+for i in 1 2 3 4 5; do
+    PGPASSWORD=testpw psql "postgresql://proxytest@127.0.0.1:4432/proxytest_db?sslmode=disable&options=endpoint%3Dep-test-123" \
+        -c "SELECT $i, pg_backend_pid()"
+done
+
+# Add --tcp-pool-mode=transaction to see multiplex behaviour
+```
diff --git a/docs/review/01-session-pool-fix.md b/docs/review/01-session-pool-fix.md
@@ -0,0 +1,124 @@
+# 01 — Session-pool half-close bug
+
+Commit `6a361a996` "session pooling works: don't poll_shutdown(compute)
+on client Terminate". Three files, ~190 lines net.
+
+## TL;DR
+
+When a client cleanly disconnected (Postgres `'X'` Terminate), the
+existing `TerminateFilter` correctly **didn't forward** the Terminate
+to compute, but the surrounding bidirectional-copy state machine
+**still ran `poll_shutdown` on the compute writer** as part of its
+"reader hit EOF, shut down the writer" cleanup path. That's
+`shutdown(SHUT_WR)` on the live compute socket. Postgres saw the FIN,
+closed its side, and the next pool reuse hit `ENOTCONN`.
+
+Fix: a parallel `transfer_one_direction_no_shutdown` that goes
+`Running → Done` directly, used for the client→compute direction in
+the pooled bidir copy. Compute's writer is never half-closed.
+
+## The chain of reasoning that got here
+
+This took several iterations to localize. The diagnostic timeline
+(captured by adding probes that I later removed) was:
+
+```
+20.247745  tcp pool: opened new connection
+20.249371  TerminateFilter: intercepted Postgres Terminate
+20.249403  transfer_one_direction: entering ShuttingDown    ← BUG
+20.249433  poll_shutdown(writer) returned Ready(Ok)         ← FIN sent
+20.249507  poll_write(empty) error: Broken pipe             ← write half dead
+20.249559  tcp pool: returning connection                    ← broken conn pooled
+[iter 2, 243ms later]
+20.493396  peer_addr error: Invalid argument                ← peer gone
+20.493689  per-client task IO error: Socket is not connected
+```
+
+The bug is that `TerminateFilter::poll_read` returns `Ready(Ok(()))`
+with **zero bytes filled** when it intercepts `'X'`. `CopyBuffer`
+treats zero-fill as EOF (`me.read_done = me.cap == filled_len`), and
+`poll_copy` returns `Ready(Ok(amt))`. `transfer_one_direction` then
+transitions:
+
+```
+Running → ShuttingDown → poll_shutdown(writer) → Done
+```
+
+That `poll_shutdown(writer)` is on the compute side. It's the bug.
+
+The early `saw_terminate()` check the existing pooled bidir copy
+**did** have was checked *after* `transfer_one_direction` returned —
+too late, the shutdown had already happened inside the call.
+
+## The fix
+
+`transfer_one_direction_no_shutdown` is the same state machine minus
+the `ShuttingDown` step:
+
+```rust
+TransferState::Running(buf) => {
+    let count = ready!(buf.poll_copy(cx, r.as_mut(), w.as_mut()))?;
+    *state = TransferState::Done(count);   // skip ShuttingDown
+}
+TransferState::ShuttingDown(count) => {
+    *state = TransferState::Done(*count);  // dead branch, but keep
+}
+TransferState::Done(count) => return Poll::Ready(Ok(*count)),
+```
+
+In `copy_bidirectional_client_compute_pooled`, swap the
+client→compute call to the no-shutdown variant:
+
+```rust
+let client_to_compute_result = transfer_one_direction_no_shutdown(
+    cx, &mut client_to_compute, &mut filtered_client, compute,
+).map_err(ErrorSource::from_client)?;
+```
+
+The compute→client direction keeps using the regular variant — if
+compute really closes, shutting down the client side is correct.
+
+## Files
+
+- `proxy/src/pglb/copy_bidirectional.rs` — the fix
+- `proxy/src/pglb/passthrough.rs` — unchanged (probes removed)
+- `proxy/src/proxy/mod.rs`, `proxy/src/tcp_pool.rs` — small unrelated
+  prep work in this commit (was-reused signal, pool key on checkout
+  via `Option<...>` tuple)
+
+## Read in this order
+
+1. `proxy/src/pglb/copy_bidirectional.rs:43-95` — read both
+   `transfer_one_direction` and `transfer_one_direction_no_shutdown`
+   side by side. The diff is exactly the deletion of the
+   `poll_shutdown` call.
+2. `proxy/src/pglb/copy_bidirectional.rs:120-235` — read
+   `copy_bidirectional_client_compute_pooled` and find the call site
+   that uses the no-shutdown variant. The CRITICAL comment is a
+   load-bearing invariant.
+3. `proxy/src/pglb/copy_bidirectional.rs::TerminateFilter` (around
+   line 460) — to understand what the filter actually emits when it
+   intercepts `'X'`. That's the input the bidir copy is responding
+   to.
+
+## Things I'd challenge in review
+
+- **The `ShuttingDown` arm in `_no_shutdown` is dead code.** It exists
+  to keep the state machine total. Pragmatic but a bit ugly. An
+  alternative is a separate state enum.
+- **Why not just have `TerminateFilter` return `Pending` instead of
+  EOF on intercept?** I considered this in passing — it would avoid
+  the EOF-cascade entirely. The downside is the filter would need to
+  self-wake or we'd hang. Strictly cleaner but adds wake-up plumbing.
+  Worth challenging.
+- **`compute→client` still uses the shutdown-variant.** That's fine
+  *today* because compute closing means the conn is dead anyway. But
+  it does mean a slow-loris compute (sends a few bytes, then EOFs)
+  half-closes the client. In our case the client is being torn down
+  anyway. Edge-case unlikely to matter in practice.
+- **The probes I used to find this bug were temporary** (zero-byte
+  `poll_write` and `peer_addr` checks at release/reuse, plus a
+  `entering ShuttingDown` log line). I stripped them before the
+  commit. If you want to see them, look at the diff history for the
+  fix on `mel/pool-fixes-and-benchmarks` — it's in the conversational
+  history but was never committed.
diff --git a/docs/review/02-multi-endpoint.md b/docs/review/02-multi-endpoint.md
@@ -0,0 +1,101 @@
+# 02 — Multi-endpoint routing
+
+Commit `fa654b133` "multi-endpoint routing via --compute-endpoint-map".
+Two files, ~57 lines added.
+
+## TL;DR
+
+Charles's earlier `--compute-endpoint` flag was a global override:
+every `wake_compute` call used the same compute address regardless of
+the endpoint ID in the connection string. This commit adds
+`--compute-endpoint-map` to the mock control plane, threads the
+`EndpointId` from `ComputeUserInfo` into `do_wake_compute`, and uses
+a three-tier lookup so the existing single-tenant test keeps working.
+
+## Lookup priority (in `do_wake_compute`)
+
+```
+self.compute_endpoint_map.get(endpoint)   // per-endpoint map (new)
+    .or(self.compute_endpoint.as_ref())   // global override (existing)
+    .unwrap_or(&self.auth_endpoint)       // legacy fallback (existing)
+```
+
+This means:
+- Endpoint IDs in the map → routed to their mapped address
+- Endpoint IDs not in the map → fall through to `--compute-endpoint`
+  if set
+- Otherwise → `--auth-endpoint` (the original behaviour)
+
+So adding a map doesn't change behaviour for IDs that aren't in it.
+That kept Charles's single-tenant flow untouched.
+
+## Flag format
+
+```
+--compute-endpoint-map='ep-A=postgresql://localhost:5433/db?sslmode=disable,ep-B=postgresql://localhost:5434/db?sslmode=disable'
+```
+
+Comma-separated entries; first `=` separates endpoint id from URL.
+URLs reuse the same `parse_compute_endpoint` validator (must include
+hostname, sslmode if present must be `disable` or `require`).
+
+## Files
+
+- `proxy/src/control_plane/client/mock.rs` — `compute_endpoint_map:
+  HashMap<EndpointId, ApiUrl>` field on `MockControlPlane`,
+  `do_wake_compute(&EndpointId)` lookup, `wake_compute` plumb-through.
+- `proxy/src/binary/proxy.rs` — `--compute-endpoint-map` clap arg,
+  `parse_compute_endpoint_map` parser.
+
+## Read in this order
+
+1. `proxy/src/control_plane/client/mock.rs::do_wake_compute` —
+   the lookup logic (the chained `or` / `unwrap_or` is the whole
+   feature).
+2. `proxy/src/control_plane/client/mock.rs::wake_compute` (in the
+   `ControlPlaneApi` impl) — confirms `user_info.endpoint` is what
+   gets passed in.
+3. `proxy/src/binary/proxy.rs::parse_compute_endpoint_map` — the
+   parser. ~20 lines, validates duplicates and missing separators.
+
+## How it was verified
+
+Two docker Postgres instances (`compute-A` on 5433, `compute-B` on
+5434), each with a `marker` table containing `'compute-A'` /
+`'compute-B'`. Same SCRAM verifier copied from auth Postgres so
+SCRAM-passthrough works on both. Then:
+
+```bash
+# Through the proxy with the map:
+psql ".../?options=endpoint%3Dep-A" -c "SELECT val FROM marker"
+# returns: compute-A
+
+psql ".../?options=endpoint%3Dep-B" -c "SELECT val FROM marker"
+# returns: compute-B
+```
+
+Plus 5x `pg_backend_pid()` per endpoint to confirm pool isolation:
+each endpoint's pool is keyed independently by `(endpoint, db, role)`,
+so within `ep-A` the pid is stable (pool reuse) and across A/B the pids
+differ (different computes).
+
+## Things I'd challenge in review
+
+- **Why not a database-backed mapping?** The brief noted that real
+  Neon uses `neon_control_plane.endpoints`. I picked the CLI-flag
+  approach because it's faster to iterate on and doesn't require
+  schema changes; the brief explicitly suggested this trade-off.
+  A real cplane would query the db.
+- **`HashMap` lookup is sync inside `do_wake_compute`'s async fn.**
+  Fine — the map is read-only after startup. If it ever became
+  mutable, would need a `RwLock`.
+- **No support for SSL-mode beyond `disable`/`require`.** Inherited
+  from Charles's `parse_compute_endpoint`. Anything unrecognised
+  errors at parse time.
+- **Endpoint id in the auth context is actually `EndpointId` (a
+  smol_str wrapper),** so the HashMap key works without allocations.
+  Look at `proxy/src/types.rs::smol_str_wrapper!` to confirm.
+- **Cancellation correctness**: I haven't audited whether
+  cross-endpoint cancellation routes correctly. Probably fine because
+  `BackendKeyData` is per-session and the cancel path doesn't go
+  through `wake_compute`. But worth confirming if you're paranoid.