Skip to content

tapdb: parallelize universe proof ingest#2194

Open
jtobin wants to merge 8 commits into
lightninglabs:mainfrom
jtobin:coalesce-root
Open

tapdb: parallelize universe proof ingest#2194
jtobin wants to merge 8 commits into
lightninglabs:mainfrom
jtobin:coalesce-root

Conversation

@jtobin

@jtobin jtobin commented Jul 4, 2026

Copy link
Copy Markdown
Member

(TLDR, disentangle the universe and multiverse writers. Postgres go brrrrrrrr. Fable's summary follows.)

Previously, every proof insert updated two trees inside one transaction: its own universe's tree, and the shared multiverse tree that summarizes all universe roots. Because the shared tree's root was rewritten by every insert, any two concurrent inserts — even into completely unrelated universes — collided on the same rows. Under Postgres's serializable isolation, a collision means abort-and-retry with backoff, so concurrent ingest was effectively single-writer, and in practice slower than serial: measured at 8 workers, 21.8 leaves/s concurrent vs 57.0 serial.

The fix is to decouple the two updates. An insert's transaction now touches only its own universe's state; the shared tree is maintained by a single coalescing writer that collects updates from all inserts and applies them in batches, one transaction per batch. Since only the latest root per universe matters, concurrent updates merge rather than queue, and callers still receive the post-flush multiverse root and inclusion proof, so RPC semantics are unchanged.

The multiverse tree is purely derived data, so a crash between universe commit and flush is healed by startup reconciliation — and its single-writer discipline lets the flush run at read committed on Postgres, taking it out of serialization-conflict detection entirely.

Result: 65.7 leaves/s concurrent vs 61.8 serial on the same benchmark, a 3x improvement with the inversion gone. Hooking #2188's batched-descent InsertMany into the flush should collapse each batch into a single descent per tree, which we expect to close much of the gap between the current ~65 leaves/s and the ~700 leaves/s ceiling measured for the universe transactions alone.

jtobin added 2 commits July 4, 2026 20:26
Inserting a sequence of leaves that reuses keys must produce the same
tree as inserting only the final leaf observed per key. This is the
invariant that permits coalescing consecutive updates to the same key
into a single insert of the latest value, as the multiverse tree does
for universe roots.
Split upsertMultiverseLeafEntry into its write half and a separate
multiverseRootAndProof read helper. The batch insert path previously
called the combined function once per item, computing a multiverse
root and inclusion proof each time only to discard them (the batch
callers never read those fields), and rewriting the same universe's
multiverse leaf once per item.

UpsertProofLeafBatch now tracks the final universe root per universe
and upserts each universe's multiverse leaf exactly once. This is
sound because SMT insertion is last-write-wins per key (see the mssmt
property test). DeleteProofLeaf similarly stops computing a discarded
root and proof.

Single-leaf paths keep their semantics: they call the write half and
then fetch the root and inclusion proof explicitly.
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the proof ingestion process by disentangling the writing of universe-specific trees from the shared multiverse tree. By moving the shared tree updates to a batched, coalesced writer, the system avoids row-level contention that previously forced serial execution under Postgres's serializable isolation. This change yields a substantial performance gain in concurrent environments and includes a robust reconciliation process to ensure data integrity after potential crashes.

Highlights

  • Performance Optimization: Decoupled universe and multiverse tree updates to eliminate contention on shared rows, enabling parallel proof ingestion and significantly improving throughput on Postgres.
  • Multiverse Root Coalescer: Introduced a root coalescer that batches multiverse updates, ensuring that concurrent inserts into different universes do not trigger serializable isolation conflicts.
  • Startup Reconciliation: Added a reconciliation mechanism that runs at startup to repair any diverged multiverse entries caused by crashes between universe commits and multiverse flushes.
  • Transaction Isolation: Implemented a transaction isolation overrider to allow multiverse flushes to run at read-committed isolation, further reducing serialization conflicts.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a multiverseRootCoalescer to batch and serialize writes to the shared multiverse trees, significantly improving concurrent universe proof ingest performance on Postgres by avoiding serialization conflicts. It also adds startup reconciliation to repair any diverged multiverse entries, along with comprehensive unit and property-based tests. Feedback on the changes highlights a potential deadlock issue in the coalescer's flush loop: if a panic occurs during a batch flush, the flushing state is never reset, permanently blocking future flushes. Utilizing a defer block to reset the flushing state is recommended to ensure robustness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tapdb/multiverse_coalescer.go
jtobin added 6 commits July 5, 2026 00:57
Every proof leaf insert must reflect its universe's new root in the
shared multiverse tree for its proof type. Doing that write inside
each insert's own transaction makes any two concurrent inserts collide
on the multiverse root rows, regardless of which universes they touch:
under Postgres serializable isolation one of them aborts and retries
with backoff, effectively serializing ingest across universes (and
making it slower than actually serial, since the retry backoff starts
at 20-60ms).

The coalescer funnels all multiverse writes through a single flusher
using leader-based group commit: the first caller to find it idle
flushes pending updates in rounds until none remain, while other
callers just await their result. Updates accumulate while a flush is
in flight and are applied together in one transaction, at most one
leaf write per universe (last-write-wins per SMT key). Waiters receive
the post-flush multiverse root and their universe's inclusion proof,
preserving the response semantics of the insert paths.

Not yet wired into the insert paths; this commit adds the component,
its concurrency unit test, and a property test comparing flushed
state against an in-memory oracle tree.
MultiverseStore.UpsertProofLeaf now commits only the per-universe
rows in its own transaction, then reflects the universe's new root in
the shared multiverse tree via the root coalescer. Insert transactions
for different universes no longer touch any shared rows, so they can
commit in parallel on Postgres instead of aborting each other through
serialization failures on the multiverse root.

Response semantics are preserved: the caller still receives the
multiverse root and inclusion proof, now from the flush that carried
its update. Under concurrent inserts into the same universe, those
fields may reflect a slightly newer universe root than the one in the
same response, which is the accurate post-flush state.

A failed flush leaves the universe leaf committed and the multiverse
entry stale; the entry is healed by the universe's next successful
update. BaseUniverseTree.UpsertProofLeaf keeps its inline multiverse
write, as it has no production callers (tests and bench fixtures
only).
UpsertProofLeafBatch now commits only the per-universe rows in its own
transaction, then submits each universe's final root to the root
coalescer, matching the single-leaf path. With this, every insert-path
write to the shared multiverse trees flows through the coalescer, so
insert transactions never contend on rows shared across universes.

Batch callers never consume multiverse roots or inclusion proofs, so
the new batch entry point registers waiters that skip generating them;
a flush only computes the root and proof for universes with a
single-leaf waiter attached.

Deletion paths (DeleteProofLeaf, DeleteUniverse) keep their inline
multiverse writes: they are rare administrative operations, and any
collision with a flush is absorbed by the existing transaction retry.
Since multiverse updates are written outside the proof insert
transaction, a daemon stopping between a universe commit and its
multiverse flush leaves the shared tree committing to a stale root, or
missing the universe's leaf entirely. The multiverse trees are fully
derived data (each leaf commits to a universe root), so this is always
repairable.

ReconcileMultiverse compares every universe root against its
multiverse leaf and rewrites diverged entries through the root
coalescer. It runs during server construction, before the store serves
concurrent traffic. The leaf construction rule is extracted into
multiverseLeafNode so the insert path and the reconciliation check
share it.

Covered by a deterministic crash-window test and a property test
mixing healthy, orphaned and tampered universes.
With the multiverse root row moved out of the insert transactions, the
remaining concurrency bottleneck is the flush transaction itself:
under serializable isolation its SMT walks take page-level predicate
locks (the recursive CTE plans as a bitmap index scan) that
false-share index pages with every in-flight universe transaction, so
flushes abort and retry with backoff, stalling all of their waiters.

The flush is the sole writer of the multiverse namespaces, so
serializable isolation buys it nothing: run it at read committed on
Postgres. A non-serializable writer takes no predicate locks and does
not flag conflicts on serializable readers, so flushes can neither
abort nor be aborted by concurrent universe transactions. The
single-writer invariant that makes this safe is enforced by the
coalescer's single-flusher role plus a process-wide multiverse write
mutex now shared with the deletion paths, the only other multiverse
writers.

Benchmarked on Postgres 15 (docker, 1k pre-populated universes, 8
workers inserting into distinct universes): concurrent ingest goes
from 21.8 leaves/s at the merge base (worse than its 57.0 leaves/s
serial, due to serialization-failure backoff) to 65.7 leaves/s, a 3x
improvement, with no remaining inversion. SQLite is unaffected: the
isolation override is Postgres-only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

1 participant