[optimize] fn:deep-equal: streaming fast-path for persistent-DOM trees (closes #4050) by joewiz · Pull Request #6337 · eXist-db/exist

joewiz · 2026-05-10T20:17:13Z

Summary

fn:deep-equal on persistent-DOM trees was ~24× slower than xmldiff:compare on PieterLamers' TEST.zip reproducer (Macbeth.xml vs Macbeth2.xml, ~3,550 elements each, byte-identical) — 5,170 ms vs 228 ms on develop tip b917e1ab1d. After this change the same comparison runs in 5–11 ms median while preserving full fn:deep-equal semantics. xmldiff:compare is not a correctness-equivalent shortcut (it overlooks attribute changes, per PieterLamers' 2026-05-08 follow-up), so closing the gap had to keep attribute / namespace / collation comparison rules intact.

Closes #4050

Diagnosis

The 2026-05-08 paused investigation pointed at "per-node persistent-DOM accessor cost" (getNamespaceURI, getLocalName, getAttributes). The hot path is more specific than that. StoredNode.getNextSibling() (and ElementImpl.getFirstChild()) acquire a fresh broker per call and open a new XMLStreamReader on the parent that walks until it finds the requested sibling — an O(siblings) scan per step. FunDeepEqual.compareContents calls these per node, so the recursion is effectively quadratic in sibling count for a wide stored DOM.

Diagnostic XQuery on the same Macbeth corpus, on develop tip:

Workload	Median
`count(//) + count(//)` (both docs)	9–14 ms
`count(distinct-values(//*/local-name()))`	13–16 ms
`string-length(serialize(doc))` (forces full walk)	13–24 ms
`fn:deep-equal(doc, doc)` (legacy)	5,170 ms

The storage-layer floor for walking both documents is ≤ 24 ms. The legacy fn:deep-equal path is paying ~5 seconds in getFirstChild / getNextSibling overhead, not in the comparison work.

What changed

A new FunDeepEqualStreamingComparator walks both subtrees via IEmbeddedXMLStreamReader (the persistent-DOM BTree stream reader) and compares events in lockstep:

Element name comparison uses expanded QName (namespace URI + local name), code-point order.
Attributes are gathered from both elements, xmlns:* declarations filtered, the remainder sorted by (namespace URI, local name) and compared positionally — matches FunDeepEqual.compareAttributes's order-insensitive semantics. Attribute values use the supplied collator.
CHARACTERS / CDATA / SPACE events are compared in document order with the supplied collator.
COMMENT and PROCESSING_INSTRUCTION events are skipped per the W3C XPath F&O 3.1 §15.3.1 fn:deep-equal spec.

Dispatch lives in FunDeepEqual.deepCompare's DOCUMENT and ELEMENT switch branches and only fires when both arguments are persistent NodeProxy instances with getImplementationType() == PERSISTENT_NODE. Memtree, atomic, attribute-as-top-level, text, map, and array shapes fall through to the legacy recursive path. Any XMLStreamException / IOException / RuntimeException from the streaming path is caught and logged at DEBUG; the legacy path runs, so correctness is preserved.

The FunDeepEqual static API gains broker-aware overloads (deepCompareSeq(..., DBBroker), deepCompare(..., DBBroker), deepEqualsSeq(..., DBBroker)); the existing 3-arg signatures delegate with broker = null, leaving callers like SwitchExpression, GroupByClause, AbstractMapType, and FunSort untouched.

Reproducer

PieterLamers' TEST.zip (Macbeth.xml + Macbeth2.xml, ~3,550 elements each), JDK Zulu 21.38.21, eXist 7.0.0-SNAPSHOT, 5-trial median:

Variant	Median	Notes
current develop (no fix)	5,170 ms	reporter's TEST.zip
`xmldiff:compare` (per @adamretter, 2022)	228 ms	not correctness-equivalent
this fix	6 ms	~2× over the serialize() floor; ~38× faster than xmldiff:compare

Synthetic 10k-element stored corpus (FunDeepEqualPerformanceTest.deepEqualOnStoredEqualDocsIsFast):

Variant	Median	Notes
current develop	~2,521 ms	from 2026-05-08 measurement
this fix	124 ms	~20× speedup

Memtree case (deepEqualOnLargeEqualTreesIsFast) unchanged at ~46 ms; the streaming path does not apply to in-memory nodes.

Correctness

78 existing *DeepEqual* JUnit tests pass: DeepEqualTest (63), FunDeepEqualPerformanceTest (14), FunDeepEqual4050ReproTest (1).
Wider 1,230-test sweep across XPathQueryTest, XQuery3Tests, FunSortTest, GroupByTest, *MapType*, *Serialization* — all green.
Full mvn test -pl exist-core: 6,607 tests, 0 failures, 0 errors, 106 skipped (pre-existing).

XQTS conformance has not been re-run for this branch — fn:deep-equal is exercised pervasively in QT4 / XQ 3.1, and CI's runner job will report any conformance delta. I'll re-run locally before merge if a reviewer wants confirmation in advance.

Out of scope

Schema-aware typed-value comparison. The streaming path treats values as xs:untypedAtomic (string equality). The reporter's reproducer is untyped; if a typed-value edge case surfaces in the test gate it'll fall back to legacy. (None has so far across 6,607 tests.)
In-memory tree caching of stored documents — broader optimisation that would benefit many functions besides fn:deep-equal. Worth a separate design pass.
Documents with leading comments or processing instructions before the root element. The documentRoot helper requires the document's first stored child to be an element; non-element first child triggers legacy fallback. (Macbeth.xml and the synthetic corpus both have a single root element.)

Test plan

mvn test -pl exist-core -Dtest='*DeepEqual*' — 78 tests pass.
mvn test -pl exist-core (full module via shared lock script) — 6,607 tests, 0 failures.
PieterLamers' Macbeth.xml reproducer: median 6 ms (was 5,170 ms).
Synthetic 10k-element stored corpus: 124 ms (was ~2,521 ms).
Memtree path unchanged: 46 ms.
Codacy PMD on changed files (the NPath warnings are on the existing deepCompare method's structure; not introduced here).
CI XQTS QT4 / XQ 3.1 / FTTS deltas — pending CI run.

Spec / context references

W3C XPath F&O 3.1 §15.3.1 fn:deep-equal: https://www.w3.org/TR/xpath-functions-31/#func-deep-equal
2026-05-08 paused diagnosis comment: Strange performance difference between xmldiff:compare() and deep-equal() #4050 (comment)
IEmbeddedXMLStreamReader (the BTree stream reader this fix builds on): exist-core/src/main/java/org/exist/stax/IEmbeddedXMLStreamReader.java

… per-type helpers Decompose deepCompare(Item, Item, Collator, DBBroker) from NPath 8,847,362 into focused per-type helpers, each well under the 200 NPath threshold: - compareArrayItems — array-vs-array, with size short-circuits - compareMapItems — map-vs-map, with size and key short-circuits - compareAtomicItems — atomic-vs-atomic, NaN/numeric handling - compareNodeItems — node-vs-node dispatcher (Java 21 switch expression over Type.DOCUMENT / ELEMENT / ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT) - compareDocumentItems — document-node compare with streaming fast-path - compareElementItems — element-node compare with streaming fast-path - compareAttributeItems — attribute name + value compare - comparePiOrNamespaceItems — PI/namespace name + string value compare - tryStreamingCompare — extracts the eXist-dbGH-4050 streaming fast-path guard + fallback that was duplicated across the DOCUMENT and ELEMENT switch branches Pure refactor; no behaviour change. The streaming fast-path's Macbeth.xml result (6-9ms median, 800x speedup) is preserved. Resolves reinhapa's NPath complexity review comment on PR eXist-db#6337. Verification: - FunDeepEqual JUnit set: 78/78 pass - XQuery3Tests XQSuite gate: 986/986 pass (1 pre-existing skip) - Codacy PMD: NPath violation on deepCompare cleared; no new findings Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joewiz · 2026-05-11T12:33:12Z

[This response was co-authored with Claude Code. -Joe]

Decomposed FunDeepEqual.deepCompare from NPath 8,847,362 into a dispatcher + 8 focused helpers (each well under 200), pushed as a separate [refactor] commit (53db259237) so it can be reviewed independently of the streaming fast-path. Per-type dispatch (atomic / array / map / node) at the top level; per-node-kind dispatch (DOCUMENT / ELEMENT / ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT) via a Java 21 switch expression inside compareNodeItems. Also extracted tryStreamingCompare so the GH-4050 fast-path guard isn't copy-pasted across the DOCUMENT and ELEMENT arms. FunDeepEqual JUnit (78/78) and XQuery3Tests (986/986) green; Macbeth.xml 6-9ms result preserved.

Per reinhapa's review on PR eXist-db#6337 (3 of the 4 Codacy findings; the remaining FunDeepEqual.deepCompare NPath 8.8M issue was already decomposed in commit 53db259 on this same PR): - walk() decomposed into per-event-type helpers (compareStartElements / compareEndElements / walkStep) with a small WalkState record-holder so depth/rootSeen propagate without static fields. Sentinel WalkState.CONTINUE distinguishes "step succeeded, keep walking" from a non-EQUAL return. The CHARACTERS / CDATA / SPACE cases were folded into one yield branch (they were identical). NPath 721 -> well under 200. - nextRelevantEvent: rewrite to flag-controlled loop so neither `continue` nor `return ev` is the final statement of the loop body (was line 233). - FunDeepEqualPerformanceTest: move STORED_EQUAL_TREES, LARGE_EQUAL_TREES, LARGE_TREES_DIFFER_AT_LEAF, LARGE_TREES_DIFFER_AT_ROOT to the field-declaration block at the top of the class. `mvn test -pl exist-core -Dtest='*DeepEqual*'` - 78/78 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joewiz · 2026-05-11T12:38:41Z

[This response was co-authored with Claude Code. -Joe]

Addressed all four Codacy findings on this PR — note that the fourth (the FunDeepEqual.deepCompare 8.8M-NPath one) was already decomposed in commit 53db259237 earlier on this branch, so no new work was needed for it. New tip: 7b82bb49f8.

FunDeepEqualStreamingComparator.walk (was NPath 721): decomposed into walkStep (arrow switch on event type) + compareStartElements / compareEndElements helpers, with a small WalkState holder so depth / rootSeen thread through without mutable closure tricks. CHARACTERS / CDATA / SPACE folded into one yield branch (they were identical). NPath well under 200.
nextRelevantEvent line 233 (branching-as-last-in-loop): rewritten as flag-controlled loop so neither continue nor return ev is the final statement of the loop body.
FunDeepEqualPerformanceTest field order: moved the four private static final String constants to the field-declaration block at the top of the class.

mvn test -pl exist-core -Dtest='*DeepEqual*' — 78/78 green (FunDeepEqual4050ReproTest 1, DeepEqualTest 63, FunDeepEqualPerformanceTest 14). codacy-cli analyze --tool pmd on the touched files is clean for the three flagged sites. The two remaining warnings (CompareObjectsWithEquals / UseEqualsToCompareStrings on lines 326 + 341) are the intentional a == b reference-equality short-circuits in compareNullable / safeCompare — pre-existing, commented as such, mirrors FunDeepEqual.safeCompare; leaving them.

duncdrum · 2026-05-11T20:32:09Z

@joewiz can you rebase please. There is no conflict, but I want to be safe, having just spent hours on NodeProxy

Closes eXist-db#4050 PieterLamers' TEST.zip reproducer (Macbeth.xml vs Macbeth2.xml, ~3,550 elements each, byte-identical) was 5,170 ms median on develop tip b917e1a, ~24x slower than xmldiff:compare's 228 ms. With this change the same comparison runs in 5-11 ms median -- within an order of magnitude of the storage-layer floor (a single serialize() walk of both docs takes ~15 ms) and ~20-45x faster than xmldiff:compare. xmldiff:compare is not a correctness- equivalent shortcut (it overlooks attribute changes, per PieterLamers' 2026-05-08 follow-up), so closing the gap had to preserve fn:deep-equal's full attribute / namespace / collation semantics. The diagnosis from the 2026-05-08 paused investigation pointed at "per-node persistent-DOM accessor cost" (getNamespaceURI, getLocalName, getAttributes). The actual hot path is more specific: StoredNode.getNextSibling() (and ElementImpl.getFirstChild()) acquire a fresh broker per call AND open a new XMLStreamReader on the parent that walks until it finds the requested sibling -- an O(siblings) walk per step. compareContents calls these per node, making the recursion effectively quadratic in sibling count for a wide stored DOM. Fix: dispatch to a new FunDeepEqualStreamingComparator when both arguments are persistent-DOM DOCUMENT or ELEMENT NodeProxy instances. The comparator opens two IEmbeddedXMLStreamReader instances via DBBroker.newXMLStreamReader (which iterates the dom.dbx node stream directly, byte-by-byte) and walks them in lockstep, comparing element names, attribute sets, character data, and skipping comments / processing instructions per the fn:deep-equal spec. Attributes are gathered, xmlns:* declarations filtered, and the remainder sorted by (namespace URI, local name) before positional value comparison -- order-insensitive, matching FunDeepEqual.compareAttributes. The new path is additive: legacy compareElements / compareContents remain in place and handle memtree, atomic, attribute-as-top-level, text, map, array, and any persistent-DOM case where the streaming reader fails (caught XMLStreamException / IOException / RuntimeException falls through to legacy with a debug log). Out of scope: - Schema-aware typed-value comparison (untyped only). Documented; the reporter's reproducer is untyped. - In-memory tree caching of stored documents (broader optimisation affecting many functions, larger architectural piece). - Documents with leading comments / PIs before the root element (the streaming path's documentRoot helper requires the first stored child to be an element; non-element first child triggers legacy fallback). Test plan: - 78 existing *DeepEqual* JUnit tests pass (DeepEqualTest 63, FunDeepEqualPerformanceTest 14, FunDeepEqual4050ReproTest 1). - 1,230-test sweep across XPathQueryTest, XQuery3Tests, FunSortTest, GroupByTest, *MapType*, *Serialization* all green. - Full mvn test -pl exist-core: 6,607 tests, 0 failures, 0 errors, 106 skipped (pre-existing). Reproducer measurements (Macbeth.xml, JDK Zulu 21.38.21, develop tip): | Variant | Median | Notes | |-------------------------------|-----------:|--------------------------------| | current develop (no fix) | 5,170 ms | reporter's TEST.zip | | xmldiff:compare (per AR 2022) | 228 ms | not correctness-equivalent | | this fix | 6 ms | within ~2-3x of serialize() floor | Synthetic 10k-element stored corpus (FunDeepEqualPerformanceTest): | Variant | Median | Notes | |-------------------------------|-----------:|--------------------------------| | current develop | ~2,521 ms | from 2026-05-08 measurement | | this fix | 124 ms | ~20x speedup | Memtree case unchanged (~46 ms on the same corpus); the streaming path does not apply to in-memory nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… per-type helpers Decompose deepCompare(Item, Item, Collator, DBBroker) from NPath 8,847,362 into focused per-type helpers, each well under the 200 NPath threshold: - compareArrayItems — array-vs-array, with size short-circuits - compareMapItems — map-vs-map, with size and key short-circuits - compareAtomicItems — atomic-vs-atomic, NaN/numeric handling - compareNodeItems — node-vs-node dispatcher (Java 21 switch expression over Type.DOCUMENT / ELEMENT / ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT) - compareDocumentItems — document-node compare with streaming fast-path - compareElementItems — element-node compare with streaming fast-path - compareAttributeItems — attribute name + value compare - comparePiOrNamespaceItems — PI/namespace name + string value compare - tryStreamingCompare — extracts the eXist-dbGH-4050 streaming fast-path guard + fallback that was duplicated across the DOCUMENT and ELEMENT switch branches Pure refactor; no behaviour change. The streaming fast-path's Macbeth.xml result (6-9ms median, 800x speedup) is preserved. Resolves reinhapa's NPath complexity review comment on PR eXist-db#6337. Verification: - FunDeepEqual JUnit set: 78/78 pass - XQuery3Tests XQSuite gate: 986/986 pass (1 pre-existing skip) - Codacy PMD: NPath violation on deepCompare cleared; no new findings Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per reinhapa's review on PR eXist-db#6337 (3 of the 4 Codacy findings; the remaining FunDeepEqual.deepCompare NPath 8.8M issue was already decomposed in commit 53db259 on this same PR): - walk() decomposed into per-event-type helpers (compareStartElements / compareEndElements / walkStep) with a small WalkState record-holder so depth/rootSeen propagate without static fields. Sentinel WalkState.CONTINUE distinguishes "step succeeded, keep walking" from a non-EQUAL return. The CHARACTERS / CDATA / SPACE cases were folded into one yield branch (they were identical). NPath 721 -> well under 200. - nextRelevantEvent: rewrite to flag-controlled loop so neither `continue` nor `return ev` is the final statement of the loop body (was line 233). - FunDeepEqualPerformanceTest: move STORED_EQUAL_TREES, LARGE_EQUAL_TREES, LARGE_TREES_DIFFER_AT_LEAF, LARGE_TREES_DIFFER_AT_ROOT to the field-declaration block at the top of the class. `mvn test -pl exist-core -Dtest='*DeepEqual*'` - 78/78 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joewiz · 2026-05-11T20:40:13Z

[This response was co-authored with Claude Code. -Joe]

Rebased onto current develop tip — clean, no conflicts across 309 commits. New tip: c9f12fb1b6.

Local smoke check post-rebase: FunDeepEqual JUnit + streaming comparator + performance test all green (14/14), GH-4050 fast-path numbers preserved (96ms / 55ms / 154ms on the equal/mismatched 10k-element trees).

Force-pushed; CI should run on the new tip.

duncdrum · 2026-05-12T09:30:31Z

+
+    private static int compareNullable(@Nullable final String a, @Nullable final String b) {
+        // NOTE: intentional reference equality short-circuit (mirrors safeCompare).
+        if (a == b) {


I believe using caps would prevent the codacy warning A == B and keep the intention intact

Capitalizing only helps if codacy's check is dumb enough to take those for constants

But there should be a way to signal the intent to the linter in the comment

There is a null capable Comparator.nullsFirst() / Comparator.nullsLast() JDK alternative to handle null cases "correctly" also should identity comparisons be avoided going forward as this will be hidden traps for future value types.

…Nullable returns Addresses duncdrum's review on PR eXist-db#6337. Three return values in compareNullable now use Constants.EQUAL / INFERIOR / SUPERIOR, matching the sibling safeCompare method exactly. The intentional reference-equality short-circuit (a == b) is preserved; the constants make the comparison's intent explicit and align with the project-wide compare-method idiom. Full-module gate: Tests run: 6747, Failures: 0, Errors: 0, Skipped: 97, BUILD SUCCESS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joewiz · 2026-05-12T13:53:02Z

[This response was co-authored with Claude Code. -Joe]

Done in `badc60ba91`. Switched `compareNullable`'s three return values to `Constants.EQUAL` / `INFERIOR` / `SUPERIOR` so it matches the sibling `safeCompare` method byte-for-byte on the compare-result idiom. The intentional reference-equality short-circuit (`a == b`) is preserved, with the existing comment explaining it.

Full-module gate green: `Tests run: 6747, Failures: 0, Errors: 0, Skipped: 97`.

reinhapa · 2026-05-12T14:09:36Z

+
+    private static int compareNullable(@Nullable final String a, @Nullable final String b) {
+        // NOTE: intentional reference equality short-circuit (mirrors safeCompare).
+        if (a == b) {


There is a null capable Comparator.nullsFirst() / Comparator.nullsLast() JDK alternative to handle null cases "correctly" also should identity comparisons be avoided going forward as this will be hidden traps for future value types.

joewiz requested a review from a team as a code owner May 10, 2026 20:17

joewiz mentioned this pull request May 11, 2026

[bugfix] XQuery 3.1 mandatory fixes from v2/xq4-core-functions (audit extract #3 subset) #6344

Open

4 tasks

reinhapa requested changes May 11, 2026

View reviewed changes

reinhapa approved these changes May 11, 2026

View reviewed changes

joewiz and others added 3 commits May 11, 2026 16:38

joewiz force-pushed the perf/4050-deep-equal-sax-comparator branch from 7b82bb4 to c9f12fb Compare May 11, 2026 20:40

duncdrum reviewed May 12, 2026

View reviewed changes

reinhapa requested changes May 12, 2026

View reviewed changes

line-o added xquery issue is related to xquery implementation performance bottlenecks, opportunities for rewriting, optimization labels May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[optimize] fn:deep-equal: streaming fast-path for persistent-DOM trees (closes #4050)#6337

[optimize] fn:deep-equal: streaming fast-path for persistent-DOM trees (closes #4050)#6337
joewiz wants to merge 4 commits into
eXist-db:developfrom
joewiz:perf/4050-deep-equal-sax-comparator

joewiz commented May 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joewiz commented May 11, 2026

Uh oh!

joewiz commented May 11, 2026

Uh oh!

duncdrum commented May 11, 2026

Uh oh!

joewiz commented May 11, 2026

Uh oh!

duncdrum May 12, 2026

Uh oh!

line-o May 12, 2026

Uh oh!

line-o May 12, 2026

Uh oh!

reinhapa May 12, 2026

Uh oh!

joewiz commented May 12, 2026

Uh oh!

reinhapa May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

joewiz commented May 10, 2026

Summary

Diagnosis

What changed

Reproducer

Correctness

Out of scope

Test plan

Spec / context references

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joewiz commented May 11, 2026

Uh oh!

joewiz commented May 11, 2026

Uh oh!

duncdrum commented May 11, 2026

Uh oh!

joewiz commented May 11, 2026

Uh oh!

duncdrum May 12, 2026

Choose a reason for hiding this comment

Uh oh!

line-o May 12, 2026

Choose a reason for hiding this comment

Uh oh!

line-o May 12, 2026

Choose a reason for hiding this comment

Uh oh!

reinhapa May 12, 2026

Choose a reason for hiding this comment

Uh oh!

joewiz commented May 12, 2026

Uh oh!

reinhapa May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants