[optimize] PrecedingFilter: K-bounded sliding window for preceding::*[K] (#2129 follow-up to PR #6325) by joewiz · Pull Request #6330 · eXist-db/exist

joewiz · 2026-05-10T04:23:49Z

Summary

Damps the position-dependence of $node/preceding::*[K] by switching LocationStep.PrecedingFilter to a K-bounded sliding window when a positional predicate is present. Companion to PR #6325 (which closed the following::* half of #2129); this PR closes the more constrained preceding::* half.

Root cause

The wildcard preceding-axis path in LocationStep.getPrecedingOrFollowing walks an IEmbeddedXMLStreamReader from the document root and applies PrecedingFilter to collect every match into the result NodeSet, then defers [K] to the predicate machinery. On a 50,000-element flat document at @xml:id=45000, that meant ~45,000 NodeProxy allocations, ~45,000 result.add() calls, and an O(N log N) sort downstream — even though [5] could only ever select 5 of them. craigberry reported this as a 12 s page-turn impact in an 1100-page book (#2129 comment).

Unlike following::* (PR #6325, which simply repositions the StAX reader to start at the reference node), the preceding::* walk cannot be skipped: matches must be emitted before the reference, and the reader is forward-only.

What changed

`exist-core/src/main/java/org/exist/xquery/LocationStep.java`

PrecedingFilter now accepts a limit parameter (mirroring FollowingFilter's constructor), passed from getPrecedingOrFollowing via the existing computeLimit() extraction (no separate optimizer pass needed — checkPositionalFilters already gates the optimization to integer-literal positional predicates with no CONTEXT_POSITION dependency).
When limit > 0, the filter maintains an ArrayDeque<NodeProxy> of size K. New matches enter at the tail; the oldest is evicted at the head when capacity is reached. The window flushes into the result NodeSet on filter termination (reference node reached or root END_ELEMENT).
Why a sliding window instead of FollowingFilter's "stop after K" pattern: preceding-axis positional [K] selects the K-th match in axis order = (K-th-from-end) in document order. We have to keep walking past every match to know which K are the most recent.

`exist-core/src/test/xquery/optimizer/positional.xqm`

ot:optimize-simple-preceding previously asserted the absence of the POSITIONAL_PREDICATE optimization on preceding::*[1], documenting the prior gap. The assertion is flipped to expect the optimization, mirroring the existing ot:optimize-simple-following-nested case at line 170.

`exist-core/src/test/java/org/exist/xquery/PrecedingAxisPositionRegressionTest.java` (new)

Mirrors the structure of FollowingAxisPositionRegressionTest (PR #6325):

correctness at 3 reference positions (early xml:id=10, mid 25, late 45)
ancestor exclusion on a nested document
K=1..4 axis-order semantics (k-th preceding * from w[5] is w[5-k])
position-independence threshold (3× / 500 ms floor) on a 50,000-element flat document
wildcard-vs-preceding-sibling::w ratio comparison

Performance impact

50,000-element flat document, 5-trial median per data point, A/B comparison via runtime kill switch on the K-bounded path:

metric	before fix	after fix
`lateMs / earlyMs` ratio (xml:id=45000 vs 5000)	~2.55×	~2.02×
wildcard-vs-`preceding-sibling::` ratio	~1.75×	~1.52×

The wildcard ratio is closer to but not yet at craigberry's reported ~1.33× sibling baseline. The StAX walk itself remains O(refPosition) since matches must be emitted before the reference and the reader is forward-only, so absolute time still grows with position. Eliminating that would require a different approach (e.g., backward navigation through the persistent NodeId structure for wildcard tests). The K-bounded buffer is a clean, conservative win on the allocation/sort axis and a prerequisite for any later walk-avoidance work.

Test plan

PrecedingAxisPositionRegressionTest — 7 tests, all green
xquery.optimizer.OptimizerTests — 60/60 pass after positional.xqm assertion flip
mvn test -pl exist-core — 6599 pass, 0 failures, 0 errors, 106 pre-existing skips
Codacy PMD on LocationStep.java — no new warnings (existing NPathComplexity on getPrecedingOrFollowing unchanged)
A/B perf measurement via runtime kill switch confirms K-bounded path fires for [K] literals and [K + 1 - $i] FLWOR-bound expressions

Closes

Partially addresses #2129. Full closure of the wildcard-vs-sibling gap (down to ~1.0× parity) requires walk-avoidance work left as a follow-up.

…eview Address duncdrum's review on PR eXist-db#6330 by splitting the mixed-purpose PrecedingAxisPositionRegressionTest.java into two artifacts: - exist-core-jmh/.../PrecedingAxisBenchmark.java: JMH benchmark for the performance comparison (wildcard preceding::* vs preceding-sibling::, at early/mid/late positions on a 50,000-element flat doc). JMH handles statistical aggregation natively; the bespoke nanoTime + median-of-N infrastructure is dropped. - exist-core/src/test/xquery/preceding-axis.xql: XQSuite tests for the correctness assertions (early/mid/late reproducer output, ancestor exclusion on the preceding axis, axis-order positional predicate semantics). The original JUnit class is removed, which also resolves the line-66 unused-variable Codacy complaint (SMALL_DOC) by deletion. Full-module gate (per strengthened test-before-push SOP): Tests run: 6597, Failures: 0, Errors: 0, Skipped: 106, BUILD SUCCESS. JMH module builds clean.

joewiz · 2026-05-12T15:29:17Z

[This response was co-authored with Claude Code. -Joe]

Done in dec9172. Split per your review:

Performance measurement → exist-core-jmh/src/main/java/org/exist/xquery/PrecedingAxisBenchmark.java (JMH, with @Param'd reference position for early/mid/late, and a preceding-sibling:: baseline for relative interpretation).
Correctness assertions → exist-core/src/test/xquery/preceding-axis.xql (XQSuite: reproducer at early/mid/late positions, ancestor exclusion on the preceding axis, axis-order positional predicate semantics).

The original mixed-purpose JUnit class is removed; that also resolves the line-66 SMALL_DOC unused-variable Codacy complaint by deletion.

Full-module gate: Tests run: 6597, Failures: 0, Errors: 0, Skipped: 106, BUILD SUCCESS. JMH module compiles clean.

duncdrum

Thank you, needs a rebase

@xml

Wildcard `preceding::*[K]` previously accumulated every preceding match into the result NodeSet between document start and the reference node, then let the predicate machinery pick the K-th. On a 50,000-element flat document at @xml:id=45000, that meant ~45,000 NodeProxy allocations, ~45,000 result.add() calls, and an O(N log N) sort downstream, even though only 5 elements could ever be selected by `[5]`. The fix: when LocationStep.computeLimit() yields a positive K (the existing positional-predicate detection used by FollowingFilter), PrecedingFilter switches to a K-bounded sliding window. Matches are buffered in an ArrayDeque sized to K, with the oldest evicted as new ones arrive. The window flushes into the result NodeSet on filter termination (reference node reached or root END_ELEMENT). Why a sliding window instead of "stop after K" (the FollowingFilter shape from PR eXist-db#6325): preceding-axis positional `[K]` selects the K-th match in axis order = (K-th-from-end) in document order. We have to keep walking past every match to know which K are the most recent. Performance impact (50,000-element flat doc, 5 trials median): - ratio lateMs/earlyMs: ~2.55x -> ~2.02x (position-dependence damped) - wildcard-vs-sibling gap: ~1.75x -> ~1.52x (closer to craigberry's reported ~1.33x sibling baseline, not yet at parity) The StAX walk itself remains O(refPosition) since matches must be emitted before the reference and the reader is forward-only, so absolute time still grows with position. Eliminating that would require a different approach (e.g., backward navigation through the persistent NodeId structure for wildcard tests). The K-bounded buffer is a clean, conservative win on the allocation/sort axis and a prerequisite for any later walk-avoidance work. Tests: - PrecedingAxisPositionRegressionTest mirrors PR eXist-db#6325's FollowingAxisPositionRegressionTest: correctness at 3 reference positions (early, mid, late), ancestor exclusion, K=1..4 axis-order semantics, position-independence threshold, and a wildcard-vs-preceding-sibling comparison. - positional.xqm:180 `optimize-simple-preceding` documented the prior gap (no POSITIONAL_PREDICATE optimization on preceding axis); the assertion is flipped to expect the optimization, mirroring the existing `optimize-simple-following-nested` case at line 170. - exist-core suite: 6599 tests, 0 failures, 0 errors (106 pre-existing skips). Partially addresses eXist-db#2129 (the preceding-axis half; following-axis half is closed by PR eXist-db#6325). Full closure of the sibling-vs-wildcard gap requires walk-avoidance, left as follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eview Address duncdrum's review on PR eXist-db#6330 by splitting the mixed-purpose PrecedingAxisPositionRegressionTest.java into two artifacts: - exist-core-jmh/.../PrecedingAxisBenchmark.java: JMH benchmark for the performance comparison (wildcard preceding::* vs preceding-sibling::, at early/mid/late positions on a 50,000-element flat doc). JMH handles statistical aggregation natively; the bespoke nanoTime + median-of-N infrastructure is dropped. - exist-core/src/test/xquery/preceding-axis.xql: XQSuite tests for the correctness assertions (early/mid/late reproducer output, ancestor exclusion on the preceding axis, axis-order positional predicate semantics). The original JUnit class is removed, which also resolves the line-66 unused-variable Codacy complaint (SMALL_DOC) by deletion. Full-module gate (per strengthened test-before-push SOP): Tests run: 6597, Failures: 0, Errors: 0, Skipped: 106, BUILD SUCCESS. JMH module builds clean.

joewiz · 2026-05-12T16:59:56Z

[This response was co-authored with Claude Code. -Joe]

Rebased on current develop tip — clean except for one conflict in exist-core-jmh/pom.xml (your recent exist-index-lucene + lucene-core deps vs. our xmldb-api add for the new benchmark). Merged by keeping all three. New tip: a1fe4631d6. CI re-runs.

duncdrum · 2026-05-12T20:36:22Z

1 Java doc error, should be easy to spot locally. A bit hard to see in the mvnd reactor log on ci.

@throws

…n CI Addresses duncdrum's CI Javadoc error on PR eXist-db#6330: - Replace {@link LocationStep.PrecedingFilter} (unresolved reference) with {@code LocationStep.PrecedingFilter}. - Add Javadoc for the default constructor. - Add Javadoc with @throws to setUp and tearDown. - Add @return and @throws to wildcardPrecedingWithPositionalPredicate and precedingSiblingBaseline. The JMH-generated annotation classes (under target/generated-sources/) also trip the doclint, but they're regenerated on every build — this commit doesn't touch them; if their lint complaints block CI, the proper fix is a javadoc-plugin excludePackageNames for org.exist.xquery.jmh_generated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joewiz · 2026-05-12T20:42:32Z

[This response was co-authored with Claude Code. -Joe]

Found it — 9f27c409a3 adds the missing Javadoc to PrecedingAxisBenchmark.java:

{@link LocationStep.PrecedingFilter} → {@code LocationStep.PrecedingFilter} (the link reference wasn't resolvable from the JMH module's classpath)
Default constructor Javadoc added
@throws added to @Setup / @TearDown methods
@return and @throws added to the two @Benchmark methods

The CI log also flagged the JMH-generated annotation classes under target/generated-sources/annotations/org/exist/xquery/jmh_generated/ for "no comment" warnings. Those are regenerated every build — if doclint still treats them as errors after this push, the right fix is an excludePackageNames for org.exist.xquery.jmh_generated in the javadoc-plugin config (the sibling benchmarks under exist-core-jmh/ would presumably need the same exclusion, suggesting it's a missing module-level config). Happy to add that as a follow-up commit if CI still flags them.

joewiz requested a review from a team as a code owner May 10, 2026 04:23

duncdrum reviewed May 12, 2026

View reviewed changes

Comment thread exist-core/src/test/java/org/exist/xquery/PrecedingAxisPositionRegressionTest.java Outdated

duncdrum reviewed May 12, 2026

View reviewed changes

Comment thread exist-core/src/test/java/org/exist/xquery/PrecedingAxisPositionRegressionTest.java Outdated

duncdrum approved these changes May 12, 2026

View reviewed changes

joewiz and others added 2 commits May 12, 2026 12:56

joewiz force-pushed the perf/2129-preceding-axis-residual branch from dec9172 to a1fe463 Compare May 12, 2026 16:58

reinhapa approved these changes May 12, 2026

View reviewed changes

line-o added xquery issue is related to xquery implementation performance bottlenecks, opportunities for rewriting, optimization labels May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[optimize] PrecedingFilter: K-bounded sliding window for preceding::*[K] (#2129 follow-up to PR #6325)#6330

[optimize] PrecedingFilter: K-bounded sliding window for preceding::*[K] (#2129 follow-up to PR #6325)#6330
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:perf/2129-preceding-axis-residual

joewiz commented May 10, 2026

Uh oh!

Uh oh!

Uh oh!

joewiz commented May 12, 2026

Uh oh!

duncdrum left a comment

Uh oh!

joewiz commented May 12, 2026

Uh oh!

duncdrum commented May 12, 2026

Uh oh!

joewiz commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

joewiz commented May 10, 2026

Summary

Root cause

What changed

exist-core/src/main/java/org/exist/xquery/LocationStep.java

exist-core/src/test/xquery/optimizer/positional.xqm

exist-core/src/test/java/org/exist/xquery/PrecedingAxisPositionRegressionTest.java (new)

Performance impact

Test plan

Closes

Uh oh!

Uh oh!

Uh oh!

joewiz commented May 12, 2026

Uh oh!

duncdrum left a comment

Choose a reason for hiding this comment

Uh oh!

joewiz commented May 12, 2026

Uh oh!

duncdrum commented May 12, 2026

Uh oh!

joewiz commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`exist-core/src/main/java/org/exist/xquery/LocationStep.java`

`exist-core/src/test/xquery/optimizer/positional.xqm`

`exist-core/src/test/java/org/exist/xquery/PrecedingAxisPositionRegressionTest.java` (new)