mssmt: batched descent InsertMany by jtobin · Pull Request #2188 · lightninglabs/taproot-assets

jtobin · 2026-07-02T12:43:54Z

Partially resolves #451.

InsertMany previously performed N single-leaf Inserts wrapped in one transaction, so internal nodes shared by the batch were recomputed once per leaf rather than once per batch. Both Insert and InsertMany also interleaved reads and writes in the same recursive descent, which forced them onto asymmetric overflow checks (single Insert rejected valid replacements the batched path accepted) and prevented atomic rejection when a batch would have overflowed the root sum.

This PR reshapes InsertMany into a single recursive descent that partitions items by the next key bit at each touched internal node, so shared work is done once per batch. The descent is split into a read phase (walk the tree, queue mutations without applying them) and a write phase (flush the mutation queue). Both APIs get the same replacement-aware check and overflow rejections become atomic. Per-call heap allocations from BranchNode/LeafNode hashing are also cut, similar to what was done in #2183.

Net wins: on 10k-leaf in-memory batches, InsertMany is roughly 2× faster with ~8× fewer allocations end-to-end; against a live store InsertMany runs ~75× faster than the equivalent Insert-loop on SQLite, and ~533× on Postgres.

(N.b., the new InsertMany can be profitably used in more places than it currently is, but I'll make those changes in a separate PR.)

gemini-code-assist · 2026-07-02T12:44:06Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the MS-SMT tree insertion process by introducing a batched approach for InsertMany. By decoupling the tree traversal from the storage write operations, the system can now perform atomic overflow checks and significantly reduce the number of database round-trips and memory allocations. This architectural change provides substantial performance improvements for both in-memory operations and persistent database stores.

Highlights

Batched MS-SMT Insertions: Refactored InsertMany to use a single recursive descent, allowing shared internal nodes to be computed once per batch rather than once per leaf.
Read/Write Phase Separation: Split tree operations into a read-only descent phase (building the new tree shape in memory) and a write phase (flushing mutations), enabling atomic overflow rejections.
Performance Gains: Achieved significant speedups (approx. 2x faster for in-memory batches, up to 533x faster on Postgres) and reduced heap allocations by optimizing hashing and storage interaction.
Overflow Check Improvements: Implemented replacement-aware overflow checks that calculate the effective delta, preventing unnecessary rejections when replacing large-sum leaves with smaller ones.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly improves the performance of batched MS-SMT insertions (InsertMany) for both FullTree and CompactedTree by splitting the insertion into a read-only descent phase and a deferred write-out phase. This change allows for atomic overflow checks on the effective delta before writing to storage. Additionally, memory allocations are reduced by caching node hashes and sums by value and using stack-resident buffers during hashing. Property tests and benchmarks have been added to verify these changes. The review feedback suggests adding a missing function comment to comply with the style guide, optimizing partitionByBit to run in-place to eliminate heap allocations, and reusing this helper in FullTree.batchInsert to reduce code duplication.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

InsertMany on both FullTree and CompactedTree was a transaction- batched loop over single-leaf insert: each leaf paid a fresh walkDown from the root, a fresh walkUp materialising one BranchNode per level, and an UpdateRoot — the same cost as N calls to Insert wrapped in one tx.Update. The form bore the name but not the act. Reform InsertMany on both trees to apply the whole batch in a single recursive descent: at each touched internal node, fetch children once, partition items by the next key bit, recurse into each non-empty side, build one new BranchNode for the level, write one delete/insert pair. For CompactedTree, descending into an empty subtree with items short-circuits via buildSubtree, which compacts to a single CompactedLeafNode when the batch reduces to one non-empty leaf — preserving the existing compaction semantics. Single-leaf Insert is untouched; the change is contained to InsertMany and one new pair of helpers per tree (batchInsert, buildSubtree on CompactedTree; batchInsert on FullTree). Benchmark (BenchmarkMssmtInsertMany on a fresh CompactedTree, -benchtime=3x): - leaves=100: 6.31ms / 219k allocs -> 3.24ms / 127k allocs - leaves=1000: 64.8ms / 2.17M allocs -> 32.0ms / 1.24M allocs - leaves=10000: 597ms / 21.6M allocs -> 326ms / 12.2M allocs The companion BenchmarkMssmtBulkInsert (Insert-in-loop, untouched) moves only within noise, confirming the single-insert path was not regressed. For random 256-bit keys, paths only begin to converge in the top ~log2(N) levels — the bottom ~240 levels see disjoint paths and pay the per-leaf cost regardless. The measured ~2x time and ~1.7x alloc reduction comes from amortised walkDown reads, CompactedTree empty-subtree shortcuts, and the single UpdateRoot call. Property suite in mssmt/insert_many_property_test.go (rapid): TestInsertManyEquivalence pins the new path to Insert-loop semantics across both Tree implementations; TestInsertManyRoundTrip verifies every inserted leaf is Get-able and its MerkleProof verifies against the resulting root; TestInsertManyOverPopulated exercises CompactedLeafNode-bearing trees by chaining two InsertMany calls; TestInsertManySumOverflow asserts cumulative- sum overflow surfaces as ErrIntegerOverflow rather than wrapping.

BranchNode.NodeHash allocated four times per call: the sha256 hasher state, the digest slice from h.Sum(nil), the cached *NodeHash, and the cached *uint64 sum. The hasher allocation is unnecessary — the input is always exactly 72 bytes (left hash || right hash || sum) — and the pointer cache fields escape to the heap on every branch construction purely as "is computed yet" sentinels. Replace the cache with value-typed fields plus ok flags, and compute the hash via sha256.Sum256 over a stack-resident 72-byte buffer. The same pointer-escape fix applies to LeafNode.nodeHash; its hasher allocation stays because the leaf value is variable-length, but h.Sum is rewritten to write into a stack buffer and the sum encoding moves from binary.Write to PutUint64 + Write. Benchmark deltas vs the prior InsertMany reform on the same fresh CompactedTree workload, -benchtime=3x: - geomean wall-clock: -22% - geomean B/op: -63% - geomean allocs/op: -80% InsertMany/leaves=10000: 326ms / 539MB / 12.2M allocs -> 251ms / 205MB / 2.5M allocs BulkInsert/leaves=10000: 592ms / 929MB / 21.6M allocs -> 452ms / 336MB / 4.4M allocs The win hits BulkInsert (Insert-in-loop) and InsertMany identically since both paths share the BranchNode hashing surface.

Insert and InsertMany historically checked overflow differently. InsertMany used a final-state-correct check (currentRoot + batchSum - existingBatchSum must fit uint64). Single Insert used the more conservative currentRoot + leafSum, which rejects valid replacements where the prior leaf's sum drops out of the root. The asymmetry showed up on the seeded case: with existing k=MaxUint64-5, Insert(k, 10) was rejected while InsertMany({k:10}) was accepted, even though both produce the same final state (root sum = 10). The reason Insert couldn't just adopt the correct check was structural: both APIs interleaved reads (walkDown, GetChildren) and writes (Insert/DeleteBranch, Insert/DeleteLeaf, Insert/DeleteCompactedLeaf) in one recursive descent. To learn the prior leaf's sum before touching storage — as the correct check requires — the descent would have had to walk the tree twice. Split the descent into a read phase and a write phase: 1. The recursion takes a *[]mutation slice. Every tx.InsertX / tx.DeleteX call is replaced with an append to that slice. Reads still go through tx. The recursion returns the new node and an existingSum — the sum of any leaves at batch keys (or the single Insert key) being replaced within this subtree — threaded up through the call stack. 2. The public Insert/InsertMany wrapper runs the descent, uses the returned existingSum to compute the effective batch delta, checks for uint64 overflow, and only then flushes the mutation queue via applyAll(tx, muts). Overflow rejection happens atomically: nothing has touched storage yet. Net effects: - FullTree.Insert and CompactedTree.Insert now share the replacement-aware overflow check with InsertMany. The prior leaf's sum rides on the existing walkDown's return value (FullTree) or is captured at the CompactedLeafNode+matching-key case (CompactedTree); no second walk needed. - InsertMany no longer pre-walks the tree O(N) times to compute existingBatchSum; the batchInsert descent accumulates it. - sumExistingLeavesFull and the inline equivalent in CompactedTree.InsertMany are removed. - Eliminates the interleaved-read-and-write anti-pattern: the recursion is now a pure (modulo reads) plan computation, and applyAll is the explicit effectful flush. Bench (BenchmarkMssmtBulkInsert/leaves=10000 on a fresh CompactedTree, -benchtime=3x): - before: 529ms / 7.00M allocs - after: 458ms / 4.42M allocs (-13% time, -37% allocs) InsertMany on fresh trees is unchanged within noise (was already empty-tree fast-pathed). InsertMany on populated trees pays strictly fewer reads now (the per-key pre-walk is gone), though the existing bench doesn't exercise that shape. Bundled in the same commit: - New file mssmt/mutation.go for the mutation type and helpers (insertBranch/deleteBranch/... wrappers + applyAll). - FullTree.Insert switched from t.Root(ctx) (which opened a separate View tx inside the Update closure) to tx.RootNode(). - CompactedTree.batchInsert fast-path for the no-op compacted-leaf case: when a batch contains only deletes of absent keys and the current node is a CompactedLeafNode none of them target, return the leaf unchanged instead of deleting and re-inserting it through buildSubtree. - buildSubtree's items[:0] in-place compaction pinned with a comment documenting the caller-owns-this-slice invariant. - Property suite extensions: - drawBatchLeaf draws EmptyLeafNode ~25% of the time, exercising the deletion paths in batchInsert and buildSubtree. - TestExistingSumGroundTruth: for any random batch on a random tree, the existingSum derived from the observed root-sum delta must equal sum of tree.Get(k).NodeSum() over k in the batch — the descent's accounting must be honest, since the overflow gate depends on it. - TestInsertOverflowReplacementParity: pins Insert and InsertMany to the same acceptance set on the seeded replacement case.

The in-tree InsertMany bench so far only ran against a fresh CompactedTree over an in-memory NewDefaultStore, so it didn't exercise the two cases the branch's perf claims care about most: populated-tree InsertMany, and InsertMany over a persistent SQL-backed store where each fanned-out tx.InsertX / DeleteX pays real I/O. Add two benchmarks that cover both. mssmt/bench_test.go: BenchmarkInsertManyPopulated seeds a CompactedTree with N leaves and then measures a single InsertMany call with M new leaves. On an Apple M4, -benchtime=1x: seed=1000 / batch=100 -> 4.0 ms / 42k allocs seed=1000 / batch=1000 -> 35.7 ms / 357k allocs seed=10000 / batch=100 -> 4.4 ms / 44k allocs seed=10000 / batch=1000 -> 44.3 ms / 426k allocs Per-key cost rises only ~10% going from a 1k- to a 10k-populated tree, consistent with the design: the bottom ~240 levels of a 256-bit-keyspace tree are per-leaf regardless of batch shape, so amortization gains come only from the top ~log2(seed+batch) levels. tapdb/mssmt_bench_test.go: BenchmarkTreeInsertMany contrasts N single tree.Insert calls against one tree.InsertMany call, both inside the same outer transaction and wrapped with the treeStoreWrapperTx pattern that real callers (burn_tree, ignore_tree, supply_tree) use — so the delta isolates the batching win from any fresh-tx overhead. SQLite (local file): batch=100 -> InsertLoop 86 ms vs InsertMany 13 ms (~6.7x) batch=1000 -> InsertLoop 4.5 s vs InsertMany 61 ms (~75x) Postgres 15 (dockertest container, tcp/localhost): batch=100 -> InsertLoop 609 ms vs InsertMany 44 ms (~13.7x) batch=1000 -> InsertLoop 187 s vs InsertMany 351 ms (~533x) Amplification grows with batch size because each per-key walkDown + walkUp + UpdateRoot triple fans out into per-node DB calls, and the socket latency of Postgres over dockertest amplifies the wall-clock gap further at scale. The Postgres bench runs only under -tags=test_db_postgres; the default (no-tag) run uses SQLite via NewTestDB's existing build-tag switch.

jtobin requested review from GeorgeTsagk and darioAnongba July 2, 2026 12:43

jtobin self-assigned this Jul 2, 2026

jtobin added this to Taproot-Assets Project Board Jul 2, 2026

jtobin added ms-smt performance labels Jul 2, 2026

github-project-automation Bot moved this to 🆕 New in Taproot-Assets Project Board Jul 2, 2026

jtobin force-pushed the mssmt-batched-insertmany branch from 250b789 to c9b377c Compare July 2, 2026 12:45

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread tapdb/mssmt_bench_test.go

Comment thread mssmt/compacted_tree.go

Comment thread mssmt/tree.go Outdated

jtobin added 5 commits July 2, 2026 10:28

docs: add release note

76f659c

jtobin force-pushed the mssmt-batched-insertmany branch from c9b377c to 76f659c Compare July 2, 2026 13:01

jtobin mentioned this pull request Jul 2, 2026

mssmt: add new batched ms-smt leaf insertion with recursive merge update #1347

Closed

jtobin added this to the v0.8.2 milestone Jul 2, 2026

jtobin mentioned this pull request Jul 4, 2026

tapdb: parallelize universe proof ingest #2194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mssmt: batched descent InsertMany#2188

mssmt: batched descent InsertMany#2188
jtobin wants to merge 5 commits into
lightninglabs:mainfrom
jtobin:mssmt-batched-insertmany

jtobin commented Jul 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jul 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jtobin commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jul 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jtobin commented Jul 2, 2026 •

edited

Loading