Skip to content

[feature] Inline let-bound persistent paths for index pre-select#13

Open
joewiz wants to merge 2 commits into
optimizer/expression-optimize-methodfrom
fix/873-let-inlining-for-index-preselect
Open

[feature] Inline let-bound persistent paths for index pre-select#13
joewiz wants to merge 2 commits into
optimizer/expression-optimize-methodfrom
fix/873-let-inlining-for-index-preselect

Conversation

@joewiz
Copy link
Copy Markdown
Owner

@joewiz joewiz commented May 7, 2026

Within-fork PR for review. Stacked on optimizer/expression-optimize-method (the four foundational Expression.optimize(CompileContext) framework commits). Will be re-targeted at eXist-db/exist:develop once that framework lands upstream.

Summary

Closes eXist-db#873.

A let $v := <persistent path> return $v[Optimizable-pred] query was running ~167x slower than the equivalent direct form. This PR adds a narrow rewrite at the LetExpr.optimize(cc) stage that converts the indirect form into the direct form when six soundness gates hold, so the legacy Optimizer pass attaches the (#exist:optimize#) pragma to the resulting LocationStep and routes through the lucene/range pre-select.

Diagnosis

For

let $a := collection('/db/system/test')//LINE
return $a[ft:query(., 'Denmark')]

Optimizer.visitFilteredExpr does wrap the FilteredExpression in the optimize pragma. At runtime Optimize.eval runs the lucene pre-select and obtains a small result NodeSet. But because the FE source is a VariableReference rather than a LocationStep, before() records contextStep == null (BasicExpressionVisitor.findFirstStep finds no LocationStep at the top level). With contextStep == null, Optimize.eval falls through to innerExpr.eval(result, null) -- but FilteredExpression.eval then calls expression.eval(contextSequence, ...) where expression is the VariableReference, and VariableReference.eval reads the bound value off the local variable stack and ignores contextSequence. So seq becomes the full //LINE NodeSet and ft:query runs once per LINE; the pre-selected result is computed and discarded.

What changed

File Purpose
exist-core/src/main/java/org/exist/xquery/LetInliner.java New helper. tryInline(LetExpr, CompileContext) checks the six gates; on success, attaches the FE's predicate to the input path's last LocationStep and returns the input path as the LetExpr's replacement (logged via cc.replaceWith).
exist-core/src/main/java/org/exist/xquery/LetExpr.java New isScoreBinding() accessor; calls LetInliner.tryInline(this, cc) after the existing literal-drop gate.
extensions/indexes/lucene/src/test/java/org/exist/indexing/lucene/LetInliningRegressionTest.java Six JUnit tests covering correctness, log inspection, three negative gates, and a loose perf bound.

The six gates (LetInliner.tryInline)

  1. Variable name present and not a score binding -- XQFT 3.0 §2.3 score bindings synthesize a double rather than the input value.
  2. Standalone let, not a chain link -- getPreviousClause() == null and the body is not a FLWORClause.
  3. No declared type on the binding -- typed declarations impose a runtime cardinality+type check that inlining bypasses.
  4. Input is node-typed.
  5. Input contains a non-wildcard LocationStep.
  6. Body is a FilteredExpression (or a length-1 PathExpr / DebuggableExpression wrapping one) whose source is the bound variable, with exactly one Optimizable predicate, and the variable does not appear anywhere else in the body -- mirrors what Optimizer.visitFilteredExpr's instanceof LocationStep simplification expects.

The "inlining is not always desirable, in particular if there's no index defined" caveat noted in a 2016 comment on the issue is exactly gate 6: we only inline when the inlined form would expose an Optimizable to visitLocationStep.

Test plan

  • mvn test -pl extensions/indexes/lucene -- new LetInliningRegressionTest passes (6/6); existing OptimizerTest, LuceneIndexTest pass; 5 pre-existing ft-facets failures unchanged (verified by stash-and-rerun).
  • mvn test -pl extensions/indexes/range -- 425/425 pass + 3 skipped, no new failures.
  • mvn test -pl exist-core -- baseline 34 fail / 20 err vs with-fix 35 fail / 20 err (+1 = known flaky 503 HTTP test); unique failure set is byte-identical.
  • Codacy PMD on changed files -- clean.
  • (Deferred) JMH LetIndirectionBenchmark -- depends on the exist-indexes-jmh module on a separate branch; will land as a follow-up.

joewiz and others added 2 commits May 6, 2026 21:54
A query of the form

    let $v := <persistent path> return $v[Optimizable-pred]

ran ~167x slower than the equivalent direct form. The legacy Optimizer
wraps the FilteredExpression in the (#exist:optimize#) pragma in both
cases, but at runtime BasicExpressionVisitor.findFirstStep cannot find
a top-level LocationStep when the FE source is a VariableReference, so
Optimize.eval records contextStep == null and falls back to
innerExpr.eval(result, ...) -- which invokes FilteredExpression.eval,
which calls VariableReference.eval, which reads the bound value off
the local variable stack and ignores contextSequence. The pre-selected
NodeSet is therefore computed and then thrown away, and the predicate
runs once per node in the full input.

This commit adds LetInliner.tryInline(let, cc) called from
LetExpr.optimize(cc). When all six soundness gates hold:

  1. variable name present and not a score binding (XQFT 3.0 §2.3);
  2. standalone let, not a chain link;
  3. no declared sequence type;
  4. node-typed input;
  5. input contains a non-wildcard LocationStep;
  6. body unwraps to a FilteredExpression whose source is the variable,
     with exactly one Optimizable predicate, and the variable does not
     appear anywhere else in the body,

the predicate is appended to the input's last LocationStep and the
LetExpr is replaced by the input path. The legacy Optimizer pass then
sees the same shape it knows how to wrap from the direct form, attaches
the pragma to the LocationStep, and routes through the index pre-select.

Gate 6 mirrors what was noted in a 2016 comment on the issue thread:
inlining is only desirable when the inlined form would expose an
Optimizable to visitLocationStep.

Closes eXist-db#873

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six tests covering the LetInliner rewrite:

  - issue873_indirectQueryReturnsSameNodes -- direct vs indirect must
    return the same nodes (correctness check).
  - issue873_inlineRewriteLogged -- the optimizer rewrite log must
    contain an "inline let $a" entry for the indirect query.
  - inline_doesNotFireWhen_letReferencedTwice -- gate (count != 1).
  - inline_doesNotFireWhen_letBoundToCount -- gate (body is not a
    FilteredExpression).
  - inline_doesNotFireWhen_letIsTyped -- gate (sequenceType present).
  - issue873_indirectQueryUnderLoosePerfBound -- a 20x ceiling +
    500ms slack, loose enough to avoid CI flakiness while still
    catching the original 167x regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz changed the title [feature] Inline let-bound persistent paths for index pre-select (closes GH-873) [feature] Inline let-bound persistent paths for index pre-select May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant