Commit 2d3b2e8
authored
feat(run-engine): flag to route getSnapshotsSince through read replica (#3423)
## Summary
Adds `RUN_ENGINE_READ_REPLICA_SNAPSHOTS_SINCE_ENABLED` (default `"0"`).
When enabled, the Prisma reads inside `RunEngine.getSnapshotsSince` run
against the read-only replica client instead of the primary. Offloads
the snapshot-polling queries fired by every running task runner off the
writer.
## Why
`getSnapshotsSince` is called from the managed runner's
fetch-and-process loop (once per poll interval, plus on every
snapshot-change notification). It runs four sequential reads per call —
one `findFirst` by snapshot id, one `findMany` on snapshots with
`createdAt > X`, one raw SQL against `_completedWaitpoints`, and chunked
`findMany` on `waitpoint`. Per concurrent run, every few seconds. It's
read-only, tolerates a small amount of staleness, and is an obvious
candidate for the replica.
## Replica-lag considerations
- **Step 1 "since snapshot not found"**: if the runner just received a
snapshot id from the primary and asks the replica before it replicates,
the function throws and the caller treats the response as an error
(runner falls back to a metadata refresh). Self-correcting, not silent.
- **Step 2 missing newly-created snapshots**: the next poll's `createdAt
> sinceSnapshot.createdAt` filter still picks them up once the replica
catches up.
- **Waitpoint junction race**: the riskiest path — if a latest snapshot
is replicated but its `_completedWaitpoints` join rows aren't yet, the
runner could advance past that snapshot with `completedWaitpoints: []`.
WAL/storage-level replication replays commits in order, so in practice
both should appear atomically on the reader, but the race window is why
the flag ships disabled.
Aurora reader shrinks all three windows to single-digit ms in typical
conditions, and its storage-level replication gives atomic visibility of
committed transactions on the reader.
## Test plan
- [ ] Flip the flag on in a non-prod environment, confirm snapshot
polling behaves normally and `getSnapshotsSince` errors in Sentry stay
flat.
- [ ] Verify writer query volume drops and reader query volume rises on
the snapshot-polling queries.
- [ ] Keep an eye on `AuroraReplicaLag` (or equivalent) during rollout.1 parent 7c95ee4 commit 2d3b2e8
5 files changed
Lines changed: 15 additions & 1 deletion
File tree
- .server-changes
- apps/webapp/app
- v3
- internal-packages/run-engine/src/engine
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
835 | 835 | | |
836 | 836 | | |
837 | 837 | | |
| 838 | + | |
838 | 839 | | |
839 | 840 | | |
840 | 841 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1633 | 1633 | | |
1634 | 1634 | | |
1635 | 1635 | | |
1636 | | - | |
| 1636 | + | |
| 1637 | + | |
1637 | 1638 | | |
1638 | 1639 | | |
1639 | 1640 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
148 | 152 | | |
149 | 153 | | |
150 | 154 | | |
| |||
0 commit comments