[optimize] PrecedingFilter: K-bounded sliding window for preceding::*[K] (#2129 follow-up to PR #6325)#6330
Conversation
…eview Address duncdrum's review on PR eXist-db#6330 by splitting the mixed-purpose PrecedingAxisPositionRegressionTest.java into two artifacts: - exist-core-jmh/.../PrecedingAxisBenchmark.java: JMH benchmark for the performance comparison (wildcard preceding::* vs preceding-sibling::, at early/mid/late positions on a 50,000-element flat doc). JMH handles statistical aggregation natively; the bespoke nanoTime + median-of-N infrastructure is dropped. - exist-core/src/test/xquery/preceding-axis.xql: XQSuite tests for the correctness assertions (early/mid/late reproducer output, ancestor exclusion on the preceding axis, axis-order positional predicate semantics). The original JUnit class is removed, which also resolves the line-66 unused-variable Codacy complaint (SMALL_DOC) by deletion. Full-module gate (per strengthened test-before-push SOP): Tests run: 6597, Failures: 0, Errors: 0, Skipped: 106, BUILD SUCCESS. JMH module builds clean.
|
[This response was co-authored with Claude Code. -Joe] Done in dec9172. Split per your review:
The original mixed-purpose JUnit class is removed; that also resolves the line-66 Full-module gate: |
duncdrum
left a comment
There was a problem hiding this comment.
Thank you, needs a rebase
Wildcard `preceding::*[K]` previously accumulated every preceding match into the result NodeSet between document start and the reference node, then let the predicate machinery pick the K-th. On a 50,000-element flat document at @xml:id=45000, that meant ~45,000 NodeProxy allocations, ~45,000 result.add() calls, and an O(N log N) sort downstream, even though only 5 elements could ever be selected by `[5]`. The fix: when LocationStep.computeLimit() yields a positive K (the existing positional-predicate detection used by FollowingFilter), PrecedingFilter switches to a K-bounded sliding window. Matches are buffered in an ArrayDeque sized to K, with the oldest evicted as new ones arrive. The window flushes into the result NodeSet on filter termination (reference node reached or root END_ELEMENT). Why a sliding window instead of "stop after K" (the FollowingFilter shape from PR eXist-db#6325): preceding-axis positional `[K]` selects the K-th match in axis order = (K-th-from-end) in document order. We have to keep walking past every match to know which K are the most recent. Performance impact (50,000-element flat doc, 5 trials median): - ratio lateMs/earlyMs: ~2.55x -> ~2.02x (position-dependence damped) - wildcard-vs-sibling gap: ~1.75x -> ~1.52x (closer to craigberry's reported ~1.33x sibling baseline, not yet at parity) The StAX walk itself remains O(refPosition) since matches must be emitted before the reference and the reader is forward-only, so absolute time still grows with position. Eliminating that would require a different approach (e.g., backward navigation through the persistent NodeId structure for wildcard tests). The K-bounded buffer is a clean, conservative win on the allocation/sort axis and a prerequisite for any later walk-avoidance work. Tests: - PrecedingAxisPositionRegressionTest mirrors PR eXist-db#6325's FollowingAxisPositionRegressionTest: correctness at 3 reference positions (early, mid, late), ancestor exclusion, K=1..4 axis-order semantics, position-independence threshold, and a wildcard-vs-preceding-sibling comparison. - positional.xqm:180 `optimize-simple-preceding` documented the prior gap (no POSITIONAL_PREDICATE optimization on preceding axis); the assertion is flipped to expect the optimization, mirroring the existing `optimize-simple-following-nested` case at line 170. - exist-core suite: 6599 tests, 0 failures, 0 errors (106 pre-existing skips). Partially addresses eXist-db#2129 (the preceding-axis half; following-axis half is closed by PR eXist-db#6325). Full closure of the sibling-vs-wildcard gap requires walk-avoidance, left as follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eview Address duncdrum's review on PR eXist-db#6330 by splitting the mixed-purpose PrecedingAxisPositionRegressionTest.java into two artifacts: - exist-core-jmh/.../PrecedingAxisBenchmark.java: JMH benchmark for the performance comparison (wildcard preceding::* vs preceding-sibling::, at early/mid/late positions on a 50,000-element flat doc). JMH handles statistical aggregation natively; the bespoke nanoTime + median-of-N infrastructure is dropped. - exist-core/src/test/xquery/preceding-axis.xql: XQSuite tests for the correctness assertions (early/mid/late reproducer output, ancestor exclusion on the preceding axis, axis-order positional predicate semantics). The original JUnit class is removed, which also resolves the line-66 unused-variable Codacy complaint (SMALL_DOC) by deletion. Full-module gate (per strengthened test-before-push SOP): Tests run: 6597, Failures: 0, Errors: 0, Skipped: 106, BUILD SUCCESS. JMH module builds clean.
dec9172 to
a1fe463
Compare
|
[This response was co-authored with Claude Code. -Joe] Rebased on current develop tip — clean except for one conflict in |
|
1 Java doc error, should be easy to spot locally. A bit hard to see in the mvnd reactor log on ci. |
…n CI Addresses duncdrum's CI Javadoc error on PR eXist-db#6330: - Replace {@link LocationStep.PrecedingFilter} (unresolved reference) with {@code LocationStep.PrecedingFilter}. - Add Javadoc for the default constructor. - Add Javadoc with @throws to setUp and tearDown. - Add @return and @throws to wildcardPrecedingWithPositionalPredicate and precedingSiblingBaseline. The JMH-generated annotation classes (under target/generated-sources/) also trip the doclint, but they're regenerated on every build — this commit doesn't touch them; if their lint complaints block CI, the proper fix is a javadoc-plugin excludePackageNames for org.exist.xquery.jmh_generated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] Found it —
The CI log also flagged the JMH-generated annotation classes under |
Summary
Damps the position-dependence of
$node/preceding::*[K]by switchingLocationStep.PrecedingFilterto a K-bounded sliding window when a positional predicate is present. Companion to PR #6325 (which closed thefollowing::*half of #2129); this PR closes the more constrainedpreceding::*half.Root cause
The wildcard preceding-axis path in
LocationStep.getPrecedingOrFollowingwalks anIEmbeddedXMLStreamReaderfrom the document root and appliesPrecedingFilterto collect every match into the resultNodeSet, then defers[K]to the predicate machinery. On a 50,000-element flat document at@xml:id=45000, that meant ~45,000NodeProxyallocations, ~45,000result.add()calls, and anO(N log N)sort downstream — even though[5]could only ever select 5 of them. craigberry reported this as a 12 s page-turn impact in an 1100-page book (#2129 comment).Unlike
following::*(PR #6325, which simply repositions the StAX reader to start at the reference node), thepreceding::*walk cannot be skipped: matches must be emitted before the reference, and the reader is forward-only.What changed
exist-core/src/main/java/org/exist/xquery/LocationStep.javaPrecedingFilternow accepts alimitparameter (mirroringFollowingFilter's constructor), passed fromgetPrecedingOrFollowingvia the existingcomputeLimit()extraction (no separate optimizer pass needed —checkPositionalFiltersalready gates the optimization to integer-literal positional predicates with noCONTEXT_POSITIONdependency).limit > 0, the filter maintains anArrayDeque<NodeProxy>of sizeK. New matches enter at the tail; the oldest is evicted at the head when capacity is reached. The window flushes into the resultNodeSeton filter termination (reference node reached or rootEND_ELEMENT).[K]selects the K-th match in axis order = (K-th-from-end) in document order. We have to keep walking past every match to know which K are the most recent.exist-core/src/test/xquery/optimizer/positional.xqmot:optimize-simple-precedingpreviously asserted the absence of thePOSITIONAL_PREDICATEoptimization onpreceding::*[1], documenting the prior gap. The assertion is flipped to expect the optimization, mirroring the existingot:optimize-simple-following-nestedcase at line 170.exist-core/src/test/java/org/exist/xquery/PrecedingAxisPositionRegressionTest.java(new)Mirrors the structure of
FollowingAxisPositionRegressionTest(PR #6325):xml:id=10, mid25, late45)K=1..4axis-order semantics (k-th preceding*fromw[5]isw[5-k])preceding-sibling::wratio comparisonPerformance impact
50,000-element flat document, 5-trial median per data point, A/B comparison via runtime kill switch on the K-bounded path:
lateMs / earlyMsratio (xml:id=45000 vs 5000)preceding-sibling::ratioThe wildcard ratio is closer to but not yet at craigberry's reported ~1.33× sibling baseline. The StAX walk itself remains
O(refPosition)since matches must be emitted before the reference and the reader is forward-only, so absolute time still grows with position. Eliminating that would require a different approach (e.g., backward navigation through the persistentNodeIdstructure for wildcard tests). The K-bounded buffer is a clean, conservative win on the allocation/sort axis and a prerequisite for any later walk-avoidance work.Test plan
PrecedingAxisPositionRegressionTest— 7 tests, all greenxquery.optimizer.OptimizerTests— 60/60 pass after positional.xqm assertion flipmvn test -pl exist-core— 6599 pass, 0 failures, 0 errors, 106 pre-existing skipsLocationStep.java— no new warnings (existingNPathComplexityongetPrecedingOrFollowingunchanged)[K]literals and[K + 1 - $i]FLWOR-bound expressionsCloses
Partially addresses #2129. Full closure of the wildcard-vs-sibling gap (down to ~1.0× parity) requires walk-avoidance work left as a follow-up.