Skip to content

[optimize] fn:deep-equal: streaming fast-path for persistent-DOM trees (closes #4050)#6337

Open
joewiz wants to merge 4 commits into
eXist-db:developfrom
joewiz:perf/4050-deep-equal-sax-comparator
Open

[optimize] fn:deep-equal: streaming fast-path for persistent-DOM trees (closes #4050)#6337
joewiz wants to merge 4 commits into
eXist-db:developfrom
joewiz:perf/4050-deep-equal-sax-comparator

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 10, 2026

Summary

fn:deep-equal on persistent-DOM trees was ~24× slower than xmldiff:compare on PieterLamers' TEST.zip reproducer (Macbeth.xml vs Macbeth2.xml, ~3,550 elements each, byte-identical) — 5,170 ms vs 228 ms on develop tip b917e1ab1d. After this change the same comparison runs in 5–11 ms median while preserving full fn:deep-equal semantics. xmldiff:compare is not a correctness-equivalent shortcut (it overlooks attribute changes, per PieterLamers' 2026-05-08 follow-up), so closing the gap had to keep attribute / namespace / collation comparison rules intact.

Closes #4050

Diagnosis

The 2026-05-08 paused investigation pointed at "per-node persistent-DOM accessor cost" (getNamespaceURI, getLocalName, getAttributes). The hot path is more specific than that. StoredNode.getNextSibling() (and ElementImpl.getFirstChild()) acquire a fresh broker per call and open a new XMLStreamReader on the parent that walks until it finds the requested sibling — an O(siblings) scan per step. FunDeepEqual.compareContents calls these per node, so the recursion is effectively quadratic in sibling count for a wide stored DOM.

Diagnostic XQuery on the same Macbeth corpus, on develop tip:

Workload Median
count(//*) + count(//*) (both docs) 9–14 ms
count(distinct-values(//*/local-name())) 13–16 ms
string-length(serialize(doc)) (forces full walk) 13–24 ms
fn:deep-equal(doc, doc) (legacy) 5,170 ms

The storage-layer floor for walking both documents is ≤ 24 ms. The legacy fn:deep-equal path is paying ~5 seconds in getFirstChild / getNextSibling overhead, not in the comparison work.

What changed

A new FunDeepEqualStreamingComparator walks both subtrees via IEmbeddedXMLStreamReader (the persistent-DOM BTree stream reader) and compares events in lockstep:

  • Element name comparison uses expanded QName (namespace URI + local name), code-point order.
  • Attributes are gathered from both elements, xmlns:* declarations filtered, the remainder sorted by (namespace URI, local name) and compared positionally — matches FunDeepEqual.compareAttributes's order-insensitive semantics. Attribute values use the supplied collator.
  • CHARACTERS / CDATA / SPACE events are compared in document order with the supplied collator.
  • COMMENT and PROCESSING_INSTRUCTION events are skipped per the W3C XPath F&O 3.1 §15.3.1 fn:deep-equal spec.

Dispatch lives in FunDeepEqual.deepCompare's DOCUMENT and ELEMENT switch branches and only fires when both arguments are persistent NodeProxy instances with getImplementationType() == PERSISTENT_NODE. Memtree, atomic, attribute-as-top-level, text, map, and array shapes fall through to the legacy recursive path. Any XMLStreamException / IOException / RuntimeException from the streaming path is caught and logged at DEBUG; the legacy path runs, so correctness is preserved.

The FunDeepEqual static API gains broker-aware overloads (deepCompareSeq(..., DBBroker), deepCompare(..., DBBroker), deepEqualsSeq(..., DBBroker)); the existing 3-arg signatures delegate with broker = null, leaving callers like SwitchExpression, GroupByClause, AbstractMapType, and FunSort untouched.

Reproducer

PieterLamers' TEST.zip (Macbeth.xml + Macbeth2.xml, ~3,550 elements each), JDK Zulu 21.38.21, eXist 7.0.0-SNAPSHOT, 5-trial median:

Variant Median Notes
current develop (no fix) 5,170 ms reporter's TEST.zip
xmldiff:compare (per @adamretter, 2022) 228 ms not correctness-equivalent
this fix 6 ms ~2× over the serialize() floor; ~38× faster than xmldiff:compare

Synthetic 10k-element stored corpus (FunDeepEqualPerformanceTest.deepEqualOnStoredEqualDocsIsFast):

Variant Median Notes
current develop ~2,521 ms from 2026-05-08 measurement
this fix 124 ms ~20× speedup

Memtree case (deepEqualOnLargeEqualTreesIsFast) unchanged at ~46 ms; the streaming path does not apply to in-memory nodes.

Correctness

  • 78 existing *DeepEqual* JUnit tests pass: DeepEqualTest (63), FunDeepEqualPerformanceTest (14), FunDeepEqual4050ReproTest (1).
  • Wider 1,230-test sweep across XPathQueryTest, XQuery3Tests, FunSortTest, GroupByTest, *MapType*, *Serialization* — all green.
  • Full mvn test -pl exist-core: 6,607 tests, 0 failures, 0 errors, 106 skipped (pre-existing).

XQTS conformance has not been re-run for this branch — fn:deep-equal is exercised pervasively in QT4 / XQ 3.1, and CI's runner job will report any conformance delta. I'll re-run locally before merge if a reviewer wants confirmation in advance.

Out of scope

  • Schema-aware typed-value comparison. The streaming path treats values as xs:untypedAtomic (string equality). The reporter's reproducer is untyped; if a typed-value edge case surfaces in the test gate it'll fall back to legacy. (None has so far across 6,607 tests.)
  • In-memory tree caching of stored documents — broader optimisation that would benefit many functions besides fn:deep-equal. Worth a separate design pass.
  • Documents with leading comments or processing instructions before the root element. The documentRoot helper requires the document's first stored child to be an element; non-element first child triggers legacy fallback. (Macbeth.xml and the synthetic corpus both have a single root element.)

Test plan

  • mvn test -pl exist-core -Dtest='*DeepEqual*' — 78 tests pass.
  • mvn test -pl exist-core (full module via shared lock script) — 6,607 tests, 0 failures.
  • PieterLamers' Macbeth.xml reproducer: median 6 ms (was 5,170 ms).
  • Synthetic 10k-element stored corpus: 124 ms (was ~2,521 ms).
  • Memtree path unchanged: 46 ms.
  • Codacy PMD on changed files (the NPath warnings are on the existing deepCompare method's structure; not introduced here).
  • CI XQTS QT4 / XQ 3.1 / FTTS deltas — pending CI run.

Spec / context references

joewiz added a commit to joewiz/exist that referenced this pull request May 11, 2026
… per-type helpers

Decompose deepCompare(Item, Item, Collator, DBBroker) from NPath 8,847,362
into focused per-type helpers, each well under the 200 NPath threshold:

  - compareArrayItems       — array-vs-array, with size short-circuits
  - compareMapItems         — map-vs-map, with size and key short-circuits
  - compareAtomicItems      — atomic-vs-atomic, NaN/numeric handling
  - compareNodeItems        — node-vs-node dispatcher (Java 21 switch
                              expression over Type.DOCUMENT / ELEMENT /
                              ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT)
  - compareDocumentItems    — document-node compare with streaming fast-path
  - compareElementItems     — element-node compare with streaming fast-path
  - compareAttributeItems   — attribute name + value compare
  - comparePiOrNamespaceItems — PI/namespace name + string value compare
  - tryStreamingCompare     — extracts the eXist-dbGH-4050 streaming fast-path
                              guard + fallback that was duplicated across
                              the DOCUMENT and ELEMENT switch branches

Pure refactor; no behaviour change. The streaming fast-path's Macbeth.xml
result (6-9ms median, 800x speedup) is preserved. Resolves reinhapa's
NPath complexity review comment on PR eXist-db#6337.

Verification:
  - FunDeepEqual JUnit set: 78/78 pass
  - XQuery3Tests XQSuite gate: 986/986 pass (1 pre-existing skip)
  - Codacy PMD: NPath violation on deepCompare cleared; no new findings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Decomposed FunDeepEqual.deepCompare from NPath 8,847,362 into a dispatcher + 8 focused helpers (each well under 200), pushed as a separate [refactor] commit (53db259237) so it can be reviewed independently of the streaming fast-path. Per-type dispatch (atomic / array / map / node) at the top level; per-node-kind dispatch (DOCUMENT / ELEMENT / ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT) via a Java 21 switch expression inside compareNodeItems. Also extracted tryStreamingCompare so the GH-4050 fast-path guard isn't copy-pasted across the DOCUMENT and ELEMENT arms. FunDeepEqual JUnit (78/78) and XQuery3Tests (986/986) green; Macbeth.xml 6-9ms result preserved.

joewiz added a commit to joewiz/exist that referenced this pull request May 11, 2026
Per reinhapa's review on PR eXist-db#6337 (3 of the 4 Codacy findings; the
remaining FunDeepEqual.deepCompare NPath 8.8M issue was already
decomposed in commit 53db259 on this same PR):

- walk() decomposed into per-event-type helpers
  (compareStartElements / compareEndElements / walkStep) with a small
  WalkState record-holder so depth/rootSeen propagate without static
  fields. Sentinel WalkState.CONTINUE distinguishes "step succeeded,
  keep walking" from a non-EQUAL return. The CHARACTERS / CDATA /
  SPACE cases were folded into one yield branch (they were identical).
  NPath 721 -> well under 200.
- nextRelevantEvent: rewrite to flag-controlled loop so neither
  `continue` nor `return ev` is the final statement of the loop body
  (was line 233).
- FunDeepEqualPerformanceTest: move STORED_EQUAL_TREES,
  LARGE_EQUAL_TREES, LARGE_TREES_DIFFER_AT_LEAF,
  LARGE_TREES_DIFFER_AT_ROOT to the field-declaration block at the
  top of the class.

`mvn test -pl exist-core -Dtest='*DeepEqual*'` - 78/78 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Addressed all four Codacy findings on this PR — note that the fourth (the FunDeepEqual.deepCompare 8.8M-NPath one) was already decomposed in commit 53db259237 earlier on this branch, so no new work was needed for it. New tip: 7b82bb49f8.

  • FunDeepEqualStreamingComparator.walk (was NPath 721): decomposed into walkStep (arrow switch on event type) + compareStartElements / compareEndElements helpers, with a small WalkState holder so depth / rootSeen thread through without mutable closure tricks. CHARACTERS / CDATA / SPACE folded into one yield branch (they were identical). NPath well under 200.
  • nextRelevantEvent line 233 (branching-as-last-in-loop): rewritten as flag-controlled loop so neither continue nor return ev is the final statement of the loop body.
  • FunDeepEqualPerformanceTest field order: moved the four private static final String constants to the field-declaration block at the top of the class.

mvn test -pl exist-core -Dtest='*DeepEqual*' — 78/78 green (FunDeepEqual4050ReproTest 1, DeepEqualTest 63, FunDeepEqualPerformanceTest 14). codacy-cli analyze --tool pmd on the touched files is clean for the three flagged sites. The two remaining warnings (CompareObjectsWithEquals / UseEqualsToCompareStrings on lines 326 + 341) are the intentional a == b reference-equality short-circuits in compareNullable / safeCompare — pre-existing, commented as such, mirrors FunDeepEqual.safeCompare; leaving them.

@duncdrum
Copy link
Copy Markdown
Contributor

@joewiz can you rebase please. There is no conflict, but I want to be safe, having just spent hours on NodeProxy

joewiz and others added 3 commits May 11, 2026 16:38
Closes eXist-db#4050

PieterLamers' TEST.zip reproducer (Macbeth.xml vs Macbeth2.xml,
~3,550 elements each, byte-identical) was 5,170 ms median on
develop tip b917e1a, ~24x slower than xmldiff:compare's 228 ms.
With this change the same comparison runs in 5-11 ms median --
within an order of magnitude of the storage-layer floor (a single
serialize() walk of both docs takes ~15 ms) and ~20-45x faster
than xmldiff:compare. xmldiff:compare is not a correctness-
equivalent shortcut (it overlooks attribute changes, per
PieterLamers' 2026-05-08 follow-up), so closing the gap had to
preserve fn:deep-equal's full attribute / namespace / collation
semantics.

The diagnosis from the 2026-05-08 paused investigation pointed
at "per-node persistent-DOM accessor cost" (getNamespaceURI,
getLocalName, getAttributes). The actual hot path is more
specific: StoredNode.getNextSibling() (and ElementImpl.getFirstChild())
acquire a fresh broker per call AND open a new XMLStreamReader
on the parent that walks until it finds the requested sibling --
an O(siblings) walk per step. compareContents calls these per
node, making the recursion effectively quadratic in sibling
count for a wide stored DOM.

Fix: dispatch to a new FunDeepEqualStreamingComparator when both
arguments are persistent-DOM DOCUMENT or ELEMENT NodeProxy
instances. The comparator opens two IEmbeddedXMLStreamReader
instances via DBBroker.newXMLStreamReader (which iterates the
dom.dbx node stream directly, byte-by-byte) and walks them in
lockstep, comparing element names, attribute sets, character
data, and skipping comments / processing instructions per the
fn:deep-equal spec. Attributes are gathered, xmlns:* declarations
filtered, and the remainder sorted by (namespace URI, local name)
before positional value comparison -- order-insensitive, matching
FunDeepEqual.compareAttributes.

The new path is additive: legacy compareElements / compareContents
remain in place and handle memtree, atomic, attribute-as-top-level,
text, map, array, and any persistent-DOM case where the streaming
reader fails (caught XMLStreamException / IOException / RuntimeException
falls through to legacy with a debug log).

Out of scope:
- Schema-aware typed-value comparison (untyped only). Documented;
  the reporter's reproducer is untyped.
- In-memory tree caching of stored documents (broader optimisation
  affecting many functions, larger architectural piece).
- Documents with leading comments / PIs before the root element
  (the streaming path's documentRoot helper requires the first
  stored child to be an element; non-element first child triggers
  legacy fallback).

Test plan:
- 78 existing *DeepEqual* JUnit tests pass (DeepEqualTest 63,
  FunDeepEqualPerformanceTest 14, FunDeepEqual4050ReproTest 1).
- 1,230-test sweep across XPathQueryTest, XQuery3Tests,
  FunSortTest, GroupByTest, *MapType*, *Serialization* all green.
- Full mvn test -pl exist-core: 6,607 tests, 0 failures, 0 errors,
  106 skipped (pre-existing).

Reproducer measurements (Macbeth.xml, JDK Zulu 21.38.21, develop tip):

| Variant                       | Median     | Notes                          |
|-------------------------------|-----------:|--------------------------------|
| current develop (no fix)      | 5,170 ms   | reporter's TEST.zip            |
| xmldiff:compare (per AR 2022) |   228 ms   | not correctness-equivalent     |
| this fix                      |     6 ms   | within ~2-3x of serialize() floor |

Synthetic 10k-element stored corpus (FunDeepEqualPerformanceTest):

| Variant                       | Median     | Notes                          |
|-------------------------------|-----------:|--------------------------------|
| current develop               | ~2,521 ms  | from 2026-05-08 measurement    |
| this fix                      |   124 ms   | ~20x speedup                   |

Memtree case unchanged (~46 ms on the same corpus); the streaming
path does not apply to in-memory nodes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… per-type helpers

Decompose deepCompare(Item, Item, Collator, DBBroker) from NPath 8,847,362
into focused per-type helpers, each well under the 200 NPath threshold:

  - compareArrayItems       — array-vs-array, with size short-circuits
  - compareMapItems         — map-vs-map, with size and key short-circuits
  - compareAtomicItems      — atomic-vs-atomic, NaN/numeric handling
  - compareNodeItems        — node-vs-node dispatcher (Java 21 switch
                              expression over Type.DOCUMENT / ELEMENT /
                              ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT)
  - compareDocumentItems    — document-node compare with streaming fast-path
  - compareElementItems     — element-node compare with streaming fast-path
  - compareAttributeItems   — attribute name + value compare
  - comparePiOrNamespaceItems — PI/namespace name + string value compare
  - tryStreamingCompare     — extracts the eXist-dbGH-4050 streaming fast-path
                              guard + fallback that was duplicated across
                              the DOCUMENT and ELEMENT switch branches

Pure refactor; no behaviour change. The streaming fast-path's Macbeth.xml
result (6-9ms median, 800x speedup) is preserved. Resolves reinhapa's
NPath complexity review comment on PR eXist-db#6337.

Verification:
  - FunDeepEqual JUnit set: 78/78 pass
  - XQuery3Tests XQSuite gate: 986/986 pass (1 pre-existing skip)
  - Codacy PMD: NPath violation on deepCompare cleared; no new findings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per reinhapa's review on PR eXist-db#6337 (3 of the 4 Codacy findings; the
remaining FunDeepEqual.deepCompare NPath 8.8M issue was already
decomposed in commit 53db259 on this same PR):

- walk() decomposed into per-event-type helpers
  (compareStartElements / compareEndElements / walkStep) with a small
  WalkState record-holder so depth/rootSeen propagate without static
  fields. Sentinel WalkState.CONTINUE distinguishes "step succeeded,
  keep walking" from a non-EQUAL return. The CHARACTERS / CDATA /
  SPACE cases were folded into one yield branch (they were identical).
  NPath 721 -> well under 200.
- nextRelevantEvent: rewrite to flag-controlled loop so neither
  `continue` nor `return ev` is the final statement of the loop body
  (was line 233).
- FunDeepEqualPerformanceTest: move STORED_EQUAL_TREES,
  LARGE_EQUAL_TREES, LARGE_TREES_DIFFER_AT_LEAF,
  LARGE_TREES_DIFFER_AT_ROOT to the field-declaration block at the
  top of the class.

`mvn test -pl exist-core -Dtest='*DeepEqual*'` - 78/78 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the perf/4050-deep-equal-sax-comparator branch from 7b82bb4 to c9f12fb Compare May 11, 2026 20:40
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Rebased onto current develop tip — clean, no conflicts across 309 commits. New tip: c9f12fb1b6.

Local smoke check post-rebase: FunDeepEqual JUnit + streaming comparator + performance test all green (14/14), GH-4050 fast-path numbers preserved (96ms / 55ms / 154ms on the equal/mismatched 10k-element trees).

Force-pushed; CI should run on the new tip.


private static int compareNullable(@Nullable final String a, @Nullable final String b) {
// NOTE: intentional reference equality short-circuit (mirrors safeCompare).
if (a == b) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using caps would prevent the codacy warning A == B and keep the intention intact

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalizing only helps if codacy's check is dumb enough to take those for constants

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there should be a way to signal the intent to the linter in the comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a null capable Comparator.nullsFirst() / Comparator.nullsLast() JDK alternative to handle null cases "correctly" also should identity comparisons be avoided going forward as this will be hidden traps for future value types.

…Nullable returns

Addresses duncdrum's review on PR eXist-db#6337. Three return values in
compareNullable now use Constants.EQUAL / INFERIOR / SUPERIOR, matching
the sibling safeCompare method exactly. The intentional reference-equality
short-circuit (a == b) is preserved; the constants make the comparison's
intent explicit and align with the project-wide compare-method idiom.

Full-module gate: Tests run: 6747, Failures: 0, Errors: 0, Skipped: 97, BUILD SUCCESS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 12, 2026

[This response was co-authored with Claude Code. -Joe]

Done in `badc60ba91`. Switched `compareNullable`'s three return values to `Constants.EQUAL` / `INFERIOR` / `SUPERIOR` so it matches the sibling `safeCompare` method byte-for-byte on the compare-result idiom. The intentional reference-equality short-circuit (`a == b`) is preserved, with the existing comment explaining it.

Full-module gate green: `Tests run: 6747, Failures: 0, Errors: 0, Skipped: 97`.


private static int compareNullable(@Nullable final String a, @Nullable final String b) {
// NOTE: intentional reference equality short-circuit (mirrors safeCompare).
if (a == b) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a null capable Comparator.nullsFirst() / Comparator.nullsLast() JDK alternative to handle null cases "correctly" also should identity comparisons be avoided going forward as this will be hidden traps for future value types.

@line-o line-o added xquery issue is related to xquery implementation performance bottlenecks, opportunities for rewriting, optimization labels May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance bottlenecks, opportunities for rewriting, optimization xquery issue is related to xquery implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strange performance difference between xmldiff:compare() and deep-equal()

4 participants