Skip to content

[feature] Inline let-bound persistent paths for index pre-select#6306

Closed
joewiz wants to merge 258 commits into
eXist-db:developfrom
joewiz:fix/873-let-inlining-for-index-preselect
Closed

[feature] Inline let-bound persistent paths for index pre-select#6306
joewiz wants to merge 258 commits into
eXist-db:developfrom
joewiz:fix/873-let-inlining-for-index-preselect

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 7, 2026

Summary

Closes #873.

A let $v := <persistent path> return $v[Optimizable-pred] query was running ~167x slower than the equivalent direct form <persistent path>[Optimizable-pred]. This PR adds a narrow rewrite at the LetExpr.optimize(cc) stage that converts the indirect form into the direct form when six soundness gates hold, so the legacy Optimizer pass attaches the (#exist:optimize#) pragma to the resulting LocationStep and routes through the lucene/range pre-select.

Base: this PR targets develop but is built on top of optimizer/expression-optimize-method, the orphan-rescue branch carrying the four foundational Expression.optimize(CompileContext) framework commits (e6993cc020, ef07bb8cbc, 50b512b611, c922cefc2c). Those need to land first; the rewrite uses cc.replaceWith(...) and the per-expression optimize pass introduced there.

Diagnosis

For

let $a := collection('/db/system/test')//LINE
return $a[ft:query(., 'Denmark')]

Optimizer.visitFilteredExpr does wrap the FilteredExpression in the optimize pragma. At runtime Optimize.eval runs the lucene pre-select and obtains a small result NodeSet. But because the FE source is a VariableReference rather than a LocationStep, before() records contextStep == null (BasicExpressionVisitor.findFirstStep finds no LocationStep at the top level). With contextStep == null, Optimize.eval falls through to innerExpr.eval(result, null) -- but FilteredExpression.eval then calls expression.eval(contextSequence, ...) where expression is the VariableReference, and VariableReference.eval reads the bound value off the local variable stack and ignores contextSequence. So seq becomes the full //LINE NodeSet and ft:query runs once per LINE; the pre-selected result is computed and discarded.

The same issue affects range and any other Optimizable-bearing predicate; the predicate type is incidental.

What changed

File Purpose
exist-core/src/main/java/org/exist/xquery/LetInliner.java New helper. tryInline(LetExpr, CompileContext) checks the six gates; on success, attaches the FE's predicate to the input path's last LocationStep and returns the input path as the LetExpr's replacement (logged via cc.replaceWith).
exist-core/src/main/java/org/exist/xquery/LetExpr.java New isScoreBinding() accessor; calls LetInliner.tryInline(this, cc) after the existing literal-drop gate.
extensions/indexes/lucene/src/test/java/org/exist/indexing/lucene/LetInliningRegressionTest.java Six JUnit tests: one correctness pair, one optimizer-log inspection of the inline rewrite firing, three negative-gate tests (used twice / bound to count() / typed declaration), one loose perf bound (20x ceiling + 500ms slack -- catches a 167x regression without CI flakiness).

The six gates (LetInliner.tryInline)

  1. Variable name present and not a score binding -- XQFT 3.0 §2.3 score bindings synthesize a double rather than the input value; inlining would change semantics.
  2. Standalone let, not a chain link -- getPreviousClause() == null and the body is not a FLWORClause. Limits v1 to standalone lets so we don't have to repair previousClause pointers.
  3. No declared type on the binding -- typed declarations impose a runtime cardinality+type check on $v's bound value; inlining bypasses it. v2 may relax this when the inlined static type still satisfies the declaration.
  4. Input is node-typed -- strings, atomics, etc. don't gain from a downstream index pre-select.
  5. Input contains a non-wildcard LocationStep -- the predicate has to attach somewhere indexable.
  6. Body is a FilteredExpression (or a length-1 PathExpr / DebuggableExpression wrapping one) whose source is the bound variable, with exactly one Optimizable predicate, and the variable does not appear anywhere else in the body -- mirrors what Optimizer.visitFilteredExpr's instanceof LocationStep simplification expects, so the post-rewrite shape is exactly what the legacy pass already knows how to wrap. The "exactly one predicate" + "no other refs" guards prevent positional-predicate or substituted-out-of-scope correctness regressions.

The "inlining is not always desirable, in particular if there's no index defined" caveat noted in a 2016 comment on the issue is exactly gate 6: we only inline when the inlined form would expose an Optimizable to visitLocationStep.

Why option (a) inline at AST level, vs option (b) special-case Optimize.eval

  • (a) generalises -- the same rewrite catches any Optimizable predicate (ft:query, GeneralComparison, FunMatches, range Lookup, ...) without per-predicate-type repair.
  • (a) addresses the 2016 caveat (gate 6 is the if-index-defined check).
  • (a) fits the framework already on this branch; cc.replaceWith handles logging and the re-analyze flag.
  • (a) mirrors BaseX's Let.inlineExpr / GFLWOR.inlineForLet pattern.

Option (b) is kept as a follow-up for shapes (a) doesn't catch (e.g., let bindings to function-call results that don't expose a LocationStep in the input).

Test plan

  • mvn test -pl extensions/indexes/lucene -- new LetInliningRegressionTest passes (6/6); existing OptimizerTest, LuceneIndexTest pass; 5 pre-existing ft-facets failures unchanged (verified by stash-and-rerun).
  • mvn test -pl extensions/indexes/range -- 425/425 pass + 3 skipped, no new failures.
  • mvn test -pl exist-core -- baseline 34 fail / 20 err vs with-fix 35 fail / 20 err (+1 = known flaky 503 HTTP test); unique failure set is byte-identical.
  • Codacy PMD on changed files -- clean.
  • (Deferred) JMH LetIndirectionBenchmark -- depends on the exist-indexes-jmh module which lives on a separate branch (feature/index-jmh-benchmarks); will land as a follow-up once that module reaches develop.

XQTS scores (QT4, 3.1, FTTS) should be unchanged because the rewrite preserves correctness; will flag if any score moves.

joewiz and others added 30 commits April 13, 2026 09:25
New function implementations in the fn: namespace:

Sequence functions: fn:characters, fn:identity, fn:void, fn:foot,
fn:trunk, fn:slice, fn:items-at, fn:replicate, fn:insert-separator,
fn:all-equal, fn:all-different, fn:duplicate-values, fn:index-where,
fn:take-while, fn:distinct-ordered-nodes, fn:siblings

Higher-order functions: fn:every, fn:some (function form),
fn:highest, fn:lowest, fn:sort-by, fn:sort-with, fn:partition,
fn:scan-left, fn:scan-right, fn:subsequence-where,
fn:transitive-closure, fn:partial-apply, fn:op

String/URI functions: fn:char, fn:graphemes, fn:decode-from-uri,
fn:parse-uri, fn:build-uri, fn:expanded-QName, fn:parse-QName,
fn:parse-integer, fn:divide-decimals

Date/Time functions: fn:civil-timezone, fn:build-dateTime,
fn:parts-of-dateTime, fn:unix-dateTime, fn:seconds

Type functions: fn:schema-type, fn:atomic-type-annotation,
fn:node-type-annotation, fn:element-to-map, fn:element-to-map-plan,
fn:type-of, fn:is-NaN

Context functions: fn:get, fn:collation, fn:collation-available,
fn:message

Parsing functions: fn:parse-html (Validator.nu HTML5 parser),
fn:invisible-xml (Markup Blitz iXML parser), fn:parse-csv,
fn:csv, fn:html-doc, fn:unparsed-binary

Data functions: fn:hash, fn:function-annotations,
fn:function-identity, fn:in-scope-namespaces

Also: DeepEqualOptions class for fn:deep-equal options map support,
FnModule registrations for all new functions.

Spec: QT4 XQuery 4.0 §14 (Functions and Operators)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standardize error codes across casting and type checking:

- Use XPTY0004 consistently for type errors (was mixed with FORG0001)
- Use FORG0001 for invalid cast values (not type mismatches)
- Add XPST0080 for xs:anyType in cast/castable (XQ4 spec)
- Add XQ4-specific error codes for new expression types
- Fix DynamicCardinalityCheck, DynamicTypeCheck, TreatAsExpression
  to use correct W3C error codes
- Align all value type convertTo() methods with spec error codes

This fixes ~30 XQTS test failures caused by wrong error codes.

Spec: W3C XQuery 3.1 §B.1 (Error Codes),
      QT4 XQuery 4.0 Appendix B (Error Codes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Array module (8 new functions):
- array:build, array:index-of, array:index-where, array:of-members,
  array:split, array:sort-by, array:sort-with, array:slice
- Plus: array:get#3 with default value

Map module (5 new functions):
- map:build, map:items, map:entries, map:filter, map:keys-where
- Plus: map:get#3 with default value, map:empty

Math module (4 new functions):
- math:cosh, math:sinh, math:tanh, math:e
- Plus: math:pow edge case fixes

Spec: QT4 XQuery 4.0 §17 (Array Module),
      QT4 XQuery 4.0 §16 (Map Module),
      QT4 XQuery 4.0 §18 (Math Module)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive XQSuite test module for XQ4 syntax features:
- Pipeline operator: basic chaining, nested pipelines, with functions
- Focus functions: fn { . + 1 }, context item binding
- Keyword arguments: named parameter passing, mixed positional/named
- String templates: interpolation, nested expressions, escaping
- Otherwise operator: empty fallback, non-empty passthrough
- Braced if: if (cond) { expr } without else
- Try/finally: cleanup execution, error propagation
- For member: array member iteration
- While clause: conditional FLWOR iteration
- Default parameter values: function declarations with defaults
- QName literals: #name symbolic references
- Hex/binary integer literals: 0xFF, 0b1010
- Numeric underscore separators: 1_000_000
- Version gating: features require xquery version "4.0"

XQTS: QT4 parser-dependent test sets (1898/2163, 87.7%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add parser support for the XQuery 3.1 `declare decimal-format` and
`declare default decimal-format` prolog declarations (spec section 4.10),
enabling users to customize number formatting via fn:format-number.

The runtime infrastructure (DecimalFormat class, XQueryContext storage,
FnFormatNumbers 3-arg support) was already in place — this adds the
missing parser recognition and tree walker processing.

Changes:
- XQuery.g: Add DECIMAL_FORMAT_DECL/DEF_DECIMAL_FORMAT_DECL tokens,
  grammar rules for named and default forms, property keywords
- XQueryTree.g: Walk AST, validate properties (single-char, zero-digit,
  distinctness), register formats in XQueryContext
- ErrorCodes.java: Add XQST0097 (duplicate) and XQST0098 (invalid)
- XQueryContext.java: Add setDefaultStaticDecimalFormat() convenience
- format-numbers.xql: Add tests for named/default formats, custom
  NaN/infinity, and error cases

Closes eXist-db#56

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build configuration:
- exist-parent/pom.xml: Add markup-blitz 1.10 (fn:invisible-xml),
  htmlparser 1.4.16 (fn:parse-html via Validator.nu)
- exist-core/pom.xml: Add markup-blitz and htmlparser dependencies
- .gitignore: Ignore iXML grammar cache files

Format improvements:
- FnFormatDates: comprehensive format-date/format-time improvements
- FnFormatNumbers: map overload, char:rendition pattern, negative
  exponent zero-padding fix

Tests:
- fnXQuery40.xql: XQSuite tests for XQ4 functions
- fnInvisibleXml.xqm: fn:invisible-xml test suite
- format-number-map.xql: fn:format-number map overload tests
- deep-equal-options-test.xq: fn:deep-equal options map tests
- Updated: fnLanguage.xqm, json-to-xml.xql, replace.xqm

Spec: QT4 XQuery 4.0 §14 (Functions and Operators)
XQTS: 732/861 (85.0%) for XQ4-specific test sets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tree walker version checks for all XQ4-only constructs: when
staticContext.getXQueryVersion() < 40, throw XPST0003 with a
descriptive message. This ensures modules declaring xquery version "3.1"
cannot use XQ4 syntax even if the parser somehow accepts it.

Gated constructs: otherwise, pipeline (->), mapping arrow (=>!),
ternary conditional (?? !!), keyword arguments, focus functions,
string templates, while clause, default parameters, for-member,
method call (=>?).

Also add system property exist.xquery4.enabled (default true) to
allow disabling XQ4 support entirely. When disabled, xquery version
"4.0" declarations throw XPST0003.

Addresses reviewer feedback from line-o on PR eXist-db#6139.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dd Javadoc

Rename the three decimal-format validation helper methods in XQueryTree.g
with a `df` prefix to clarify their scope:

- requireSingleChar → dfRequireSingleChar
- validateZeroDigit → dfValidateZeroDigit
- validateDistinctPictureChars → dfValidateDistinctPictureChars

Add Javadoc comments on DecimalFormat.UNNAMED and UNNAMED_DECIMAL_FORMAT
explaining the XPath 3.1 spec origin of the "unnamed" terminology.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d fn:tokenize

Without the ! flag, empty-matching patterns raise FORX0003 in both XQ 3.1
and XQ 4.0 mode. With the ! flag in XQ 4.0, fn:replace uses the Java regex
fallback and fn:tokenize tokenizes between each character.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e List

Adds the core XQUF expression classes for insert, delete, replace,
rename, and transform (copy-modify-return) operations, plus the
Pending Update List (PUL) merge-and-apply infrastructure.

The PUL implements the W3C XQUF 3.0 update primitive model with
five phases: insert, replace, rename, delete, and put. Update
primitives are collected during expression evaluation and applied
atomically at snapshot boundaries.

Expression classes:
- XQUFInsertExpr: insert node (before/after/into/as first/as last)
- XQUFDeleteExpr: delete node
- XQUFReplaceNodeExpr: replace node
- XQUFReplaceValueExpr: replace value of node
- XQUFRenameExpr: rename node
- XQUFTransformExpr: copy-modify-return (in-memory deep copy + PUL)

Includes namespace conflict detection (XUDY0021/0023/0024) inspired
by BaseX's NamePool approach.

Spec: W3C XQuery Update Facility 3.0, Sections 2.1-2.5
XQTS: 684/684 non-schema tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements XUST0001 (non-updating expression in updating context)
and XUST0002 (updating expression in non-updating context) static
type checking across the expression hierarchy.

Adds Expression.isVacuous() method for recursive vacuousness
detection, which allows expressions like empty sequences and
conditionals with all-vacuous branches to pass XUST checks.
This is required because vacuous expressions are compatible with
both updating and non-updating contexts per the W3C XQUF 3.0 spec.

Key changes:
- Expression.java: isVacuous(), isUpdating(), analyze() flags
- PathExpr: context step propagation fix (i>=1 for XQUF)
- TypeswitchExpression, SwitchExpression: branch-level XUST checks
- ConditionalExpression: then/else branch XUST checks
- ErrorCodes: XUST0001, XUST0002, XUDY0009, XUDY0014-0024,
  XUTY0004-0013, XUTY0022
- FunctionSignature: updating annotation support
- FunctionCall: updating function call propagation

Spec: W3C XQuery Update Facility 3.0, Section 2.6 (Static Typing)
XQTS: 684/684 non-schema tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add full text grammar productions to XQuery.g parser and XQueryTree.g
tree walker for the W3C XQuery and XPath Full Text 3.0 specification.
This establishes the parsing foundation for ftcontains expressions,
FTSelection operators (FTOr, FTAnd, FTMildNot, FTUnaryNot, FTWords),
and positional filters (FTOrder, FTWindow, FTDistance, FTScope,
FTContent, FTTimes).

The AST expression classes in org.exist.xquery.ft model the full text
selection grammar as a tree of FTAbstractExpr nodes. Each node
corresponds to a production in the XQFT grammar and carries the
evaluation semantics defined in the spec.

Spec references:
- W3C XQuery and XPath Full Text 3.0, Section 3.1 (Full-Text Selections)
- W3C XQuery and XPath Full Text 3.0, Section 3.2 (Full-Text Contains)
- W3C XQuery and XPath Full Text 3.0, Section 3.3 (Positional Filters)

FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…util:track()

Phase 1 of query profiling visibility (inspired by BaseX prof: module):

util:time($expr) / util:time($expr, $label):
  Pass-through wrapper that measures and logs execution time.
  Returns the expression result unchanged.

util:memory($expr) / util:memory($expr, $label):
  Same pattern for memory measurement. Logs the memory delta
  during expression evaluation.

util:track($expr) / util:track($expr, $label):
  Returns map { "time": xs:dayTimeDuration, "memory": xs:integer,
  "value": item()* }. Most useful of the three — combines time
  and memory measurement in a structured result.

All registered in UtilModule.java. 13 XQSuite tests in profiling.xql.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the memtree DocumentImpl with mutation methods required by
XQUF update primitives operating on in-memory (copy-modify-return)
nodes. The flat-array architecture of eXist's memtree requires
careful index management for insertions, deletions, and replacements.

Key additions to DocumentImpl:
- insertChildNode/insertChildNodes: insert before/after/into
- removeNode: delete with descendant cleanup and array compaction
- replaceNode: atomic replace preserving document order
- replaceValue: text/attribute/PI/comment value replacement
- renameNode: element/attribute/PI rename with namespace handling
- replaceElementContent: replace all children with text node
- compact(): post-update array defragmentation

ElementImpl/NodeImpl changes:
- getFirstChildFor(): skip deleted nodes in chain navigation
- Namespace propagation helpers for insert operations

Updates are processed in reverse document order where needed to
avoid flat-array cross-contamination during batch operations.

Spec: W3C XQuery Update Facility 3.0, Section 3 (Update Primitives)
XQTS: 684/684 non-schema tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement the full text evaluation engine (FTEvaluator) using the
sequential AllMatches model defined in W3C XQFT 3.0, Section 4. The
evaluator tokenizes string values, applies match options (stemming,
wildcards, diacritics sensitivity, case sensitivity, stop words,
language), and evaluates the full text selection tree against token
streams.

FTContainsExpr is the top-level expression node for `contains text`
expressions, bridging the XQuery evaluation pipeline to the FT
evaluator. FTMatchOptions aggregates all match option settings.
FTThesaurus provides synonym expansion via configurable thesaurus
URIs, with lazy initialization for runtime efficiency.

Spec references:
- W3C XQuery and XPath Full Text 3.0, Section 4 (Full-Text Evaluation)
- W3C XQuery and XPath Full Text 3.0, Section 4.1 (AllMatches)
- W3C XQuery and XPath Full Text 3.0, Section 5 (Match Options)
- W3C XQuery and XPath Full Text 3.0, Section 5.6 (Thesaurus Option)
- W3C XQuery and XPath Full Text 3.0, Section 5.7 (Stop Word Option)

FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 of query profiling: util:explain($query) returns the compiled
expression tree as XML, showing the post-optimization query plan.

FunExplain.java:
- Compiles query string using the same pattern as util:compile()
- Runs AnalyzeContextInfo for optimizer annotations
- Serializes tree via QueryPlanSerializer visitor
- Returns root <explain> element

QueryPlanSerializer.java:
- Extends DefaultExpressionVisitor to walk the expression tree
- Emits XML elements for each expression type:
  <flwor>, <for>, <let>, <where>, <return>, <order-by>, <group-by>,
  <path>, <step>, <predicate>, <filter>, <comparison>,
  <function-call>, <builtin-function>, <user-function>,
  <variable>, <if>, <union>, <intersect>, <try-catch>,
  <element-constructor>, <text-constructor>, etc.
- Includes @axis, @test, @variable, @name, @Arity, @operator,
  @line, @column attributes where applicable

Example:
  util:explain('for $x in 1 to 10 where $x > 5 return $x')
  → <explain><for variable="$x"><in>...</in>...</for></explain>

7 new XQSuite tests for explain (83/83 total util tests pass).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds parser and tree walker rules for all W3C XQuery Update Facility
3.0 expressions: insert, delete, replace (node and value), rename,
and copy-modify-return (transform).

XQuery.g (lexer/parser):
- New tokens: REPLACE, RENAME, COPY, MODIFY, FIRST, LAST, BEFORE,
  AFTER, INTO, WITH, UPDATING
- insertExpr, deleteExpr, replaceExpr, renameExpr, transformExpr
- Integration into exprSingle production
- Updating function annotations

XQueryTree.g (tree walker):
- Instantiates XQUF expression classes from AST
- Legacy/XQUF syntax conflict detection via markLegacyUpdate/
  markXQUFUpdate on XQueryContext
- Updating function declaration handling

XQueryFunctionAST.java:
- isUpdating() flag for function declarations

Spec: W3C XQuery Update Facility 3.0, Section 2.1 (Syntax)
XQTS: 684/684 non-schema tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend ForExpr and LetExpr to support optional `score` variable
bindings as defined in XQFT 3.0. The score variable captures the
relevance score from full-text matching for use in ordering or
filtering.

Add XQFT-specific error codes (FTST0008, FTST0009, FTDY0016,
FTDY0017, FTDY0020) to ErrorCodes.java. Update XQueryContext with
thesaurus and stop-word URI map caching to survive context resets,
fixing a bug where FT match options were lost during module imports.
Fix FTMatchOptions import in XQueryContext to use the correct
org.exist.xquery.ft package path.

Update StaticXQueryException and XQuery.java for full-text error
propagation during static analysis.

Spec references:
- W3C XQuery and XPath Full Text 3.0, Section 2.3 (Score Variables)
- W3C XQuery and XPath Full Text 3.0, Appendix B (Error Conditions)

FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…egex fork

Upgrade Saxon-HE from 9.9.1-8 to 12.5 and remove the exist-saxon-regex
fork (org.exist-db:exist-saxon-regex:9.4.0-9.e1), a copy of Saxon 9.4's
internal regex classes that has been maintained separately for over a decade.
Saxon 12's public regex API makes the fork unnecessary.

Saxon 12 API migration:
- FastStringBuffer removed: use FloatValue.floatToString() and
  DoubleValue.doubleToString() for XPath-compliant formatting
- Regex APIs now take UnicodeString: wrap with StringView.of()
- XPathException.getErrorCodeLocalPart() replaced by
  getErrorCodeQName().getLocalPart()
- RegexIterator.MatchHandler moved to top-level RegexMatchHandler
- Xslt30Transformer.setInitialMode() now throws SaxonApiException
- Saxon 12 rejects duplicate document-URIs in the document pool
- Saxon 12 rejects null URIResolver and explicit xml namespace
  declarations in DOM and SAX pipelines
- Saxon 12's LinkedTreeBuilder rejects duplicate startDocument events

exist-saxon-regex replaced by Saxon 12's JavaRegularExpression API.

Full exist-core test suite: 6533 tests, 0 failures, 0 errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add record type support to eXist-db's type system, laying the groundwork
for XQ4 record type annotations (record(name as xs:string, age? as xs:integer, *)).

Type system:
- Type.RECORD = 70, subtype of MAP_ITEM (aligned with parser-next branch)
- RecordType class with FieldDeclaration (name, type, optional flag)
- Extensible records (trailing "*") support
- RecordType.matches() validates map contents against field declarations

SequenceType integration:
- isRecordType(), getFieldDeclarations(), isRecordExtensible() API
- setRecordType()/getRecordType() for parser integration
- checkType() delegates to RecordType.matches() for runtime validation

DynamicTypeCheck:
- Maps pass through record type check (handled by SequenceType.checkType)

Tests: 7 unit tests covering type hierarchy, field declarations, optional
fields, extensible records, toString output, and SequenceType record API.

Parser support (field accessor .name syntax, record test parsing) requires
coordination with parser-next branch and will follow in a separate commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 of query profiling: util:profile($query) executes a query with
profiling enabled and returns a map combining:

- result: the actual query result
- time: xs:dayTimeDuration of execution
- memory: xs:integer bytes delta during execution
- plan: element(explain) — the compiled expression tree (from Phase 2)
- stats: element() — profiler statistics XML from PerformanceStatsImpl
  (function calls, index usage, optimizations)

FunProfile.java:
- Compiles and analyzes query using the same pattern as util:explain()
- Enables eXist's built-in Profiler at verbosity=10 before execution
- Captures timing via System.nanoTime(), memory via Runtime
- Serializes the expression tree via QueryPlanSerializer
- Serializes profiler stats via PerformanceStatsImpl.serialize()
- Packages everything into a MapType result

Two signatures:
- util:profile($query as xs:string) as map(*)
- util:profile($query, $module-load-path) as map(*)

9 new XQSuite tests (92/92 total util tests pass).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds fn:put() for persistent document storage via the XQUF Pending
Update List, plus the XQueryContext integration layer that connects
the XQUF expression classes to the query execution lifecycle.

XQUFFnPut.java:
- W3C fn:put($node, $uri) implementation
- Creates a put update primitive on the PUL
- Validates node must be document or element node (FOUP0001)

XQueryContext changes:
- PendingUpdateList field with get/set accessors
- Legacy/XQUF syntax conflict detection (markLegacyUpdate/
  markXQUFUpdate) to prevent mixing update syntaxes
- PUL reset in context cleanup

XQuery.java:
- PUL application at query completion boundary

FnModule.java:
- Register XQUFFnPut in fn: namespace

FunInScopePrefixes.java:
- Support in-memory nodes for namespace prefix queries

Spec: W3C XQuery Update Facility 3.0, Section 2.5.2 (fn:put)
XQTS: 684/684 non-schema tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add four test classes covering the W3C XQFT 3.0 implementation:

- FTConformanceTest: 622-line conformance suite covering the core XQFT
  test cases mapped from the W3C Full Text Test Suite (FTTS), verifying
  spec compliance for contains-text expressions, match options, and
  positional filters.
- FTContainsTest: Integration tests exercising ftcontains expressions
  end-to-end through the XQuery engine, including edge cases for
  empty sequences, mixed content, and attribute nodes.
- FTEvaluatorTest: Unit tests for the AllMatches evaluator, covering
  tokenization, match option application, and boolean composition.
- FTParserTest: Parser tests verifying that the ANTLR 2 grammar
  correctly parses all XQFT productions and builds the expected AST.

FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… 6.0)

Upgrades the embedded Jetty server from 11.0.25 to 12.0.16, migrating
from the legacy Jetty 11 APIs to EE10 (Jakarta Servlet 6.0):

- Jetty 12 EE10 servlet container replaces Jetty 11 APIs
- WebSocket upgrade from Jetty 11 to Jetty 12 WebSocket APIs
- Servlet filter/listener registration via EE10 programmatic API
- GZip handler migration to Jetty 12 GzipHandler
- SSL/TLS configuration updated for Jetty 12 ServerConnector
- JMX and statistics handler migration
- Thread pool configuration for Jetty 12
- Test infrastructure updated for Jetty 12 test server

Dependencies:
- jetty-ee10-servlet, jetty-ee10-webapp, jetty-ee10-websocket-*
- Remove deprecated jetty-servlet, jetty-webapp modules

This is a breaking infrastructure change that affects all HTTP/WebSocket
communication. Thoroughly tested with ExistWebServer test harness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Gate FORX0003 (empty-matching regex) on XQuery version in FunReplace
  and FunTokenize: raise the error in XQ 3.1 mode, allow empty matches
  in XQ 4.0 mode
- Add xquery version "4.0" support to XQueryTree.g tree walker
- Declare replace.xqm test module as xquery version "4.0"
- Fix unnecessary FQN in FunReplace (PMD)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 4 new XPath axes from the XQuery 4.0 specification:
- following-or-self: self + following (forward axis)
- following-sibling-or-self: self + following-sibling (forward axis)
- preceding-or-self: self + preceding (reverse axis)
- preceding-sibling-or-self: self + preceding-sibling (reverse axis)

These convenience axes combine existing axes with self, avoiding
verbose union patterns like (self::node() | following-sibling::node()).

Implementation evaluates as union of self + base axis results,
preserving document order. Changes:
- Constants.java: 4 new axis constants (14-17)
- XQuery.g: axis names in forward/reverse specifiers + reserved keywords
- XQueryTree.g: axis name to constant mapping
- LocationStep.java: getOrSelfAxis() dispatch method
- Predicate.java: reverse axis detection for new axes

8/8 JUnit tests pass, 148 XPath tests pass (0 regressions).

Spec: https://qt4cg.org/specifications/xquery-40/xpath-40-xquery-40.html#axes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…(Phase 3-4)

Phase 3 — FieldAccessor expression:
- New FieldAccessor class evaluates $expr.fieldName as map:get($expr, "fieldName")
- Returns empty sequence for missing fields, XPTY0004 for non-map base
- Parser-next branch will wire the .NCName postfix syntax to this class

Phase 4 — Record type checking in function arguments:
- RecordTypeCheck wraps function arguments declared with record types
- Validates at runtime: argument is a map, has all required fields, field
  types match, no extra keys (unless extensible)
- Wired into Function.checkArgumentType() (both typeMatches and non-match
  paths) alongside existing FunctionTypeCheck pattern
- Also wired into UserDefinedFunction.eval() for runtime parameter validation
- Descriptive error messages: "missing required field 'age'" etc.

10 new tests (4 FieldAccessor + 6 RecordTypeCheck), all 17 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the array/map filter expression from the QT4 specification.
The ?[expr] operator filters array members or map entries by
evaluating a predicate with each member/value as the context item.

Examples:
  [1, 2, 3, 4, 5]?[. > 3]            -> [4, 5]
  map{"a":1, "b":2, "c":3}?[. > 1]   -> map{"b":2, "c":3}

Parser (XQuery.g):
- Add FILTER_AM token
- Add filterExprAM rule: QUESTION LPPAREN exprSingle RPPAREN
- Disambiguate from lookup: ?[ is FilterExprAM, ?name is lookup

Tree Walker (XQueryTree.g):
- Add filterExprAM rule that creates FilterExprAM expression
- Chain in postfixExpr alongside lookup and predicate

FilterExprAM.java:
- Extends AbstractExpression
- For arrays: iterates members, evaluates predicate with each as
  context item, keeps members where EBV is true
- For maps: iterates entries, evaluates predicate with each value
  as context item, keeps entries where EBV is true
- XPTY0004 for non-array/map targets

17 XQSuite tests (993/993 total XQuery3Tests pass, 0 regressions).

Spec: https://qt4cg.org/specifications/xquery-40/xpath-40-xquery-40.html#id-filter-am

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3.2: Per-query scoping for PerformanceStats
- Add Profiler.getPerformanceStats() to expose per-profiler stats
- FunProfile now reads from the per-query profiler's stats instead
  of the global BrokerPool stats, ensuring correct data under
  concurrent load

Phase 4.1: Optimizer decision logging
- Add DEBUG-level logging in GeneralComparison.analyze() showing
  when index optimization is applied or skipped, including the
  expression text, QName, and optimization type

Phase 5.1: util:index-report($query) as element()
- Execute a query with profiling enabled and return an XML report
  of index usage and optimizations from the per-query PerformanceStats
- Uses the same per-query profiler isolation as util:profile()

2 new XQSuite tests (94/94 total pass).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds comprehensive JUnit test coverage for all W3C XQuery Update
Facility 3.0 operations, plus a performance benchmark for measuring
update primitive throughput.

XQUFBasicTest.java (73 tests):
- Insert node (before/after/into/as first/as last)
- Delete node (element, attribute, text, comment, PI)
- Replace node and replace value of node
- Rename node (element, attribute, PI)
- Copy-modify-return (transform) expressions
- Namespace conflict detection (XUDY0021/0023/0024)
- XUST0001/XUST0002 static type errors
- Complex multi-step update scenarios

XQUFBenchmark.java:
- Performance benchmarks for insert/delete/replace throughput
- Deep tree copy-modify-return scaling tests

bindingConflictXQUF.xqm:
- XQUF-specific namespace binding conflict tests using
  copy-modify-return syntax (separated from legacy tests
  because the two syntaxes cannot be mixed in one module)

XQTS: 684/684 non-schema tests pass (W3C XQuery Update Facility)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joewiz and others added 26 commits April 26, 2026 02:07
…ack, number-parser

Implements the full options map for fn:json-to-xml per XPath Functions
spec section 22.3.2 and XQuery 4.0 extensions:

- escape option: when true, preserves JSON escape sequences (\n, \t,
  \r, \\, etc.) as literal text in XML output and adds escaped="true"
  and escaped-key="true" attributes. When false (default), replaces
  non-XML characters with U+FFFD.

- duplicates option: default is "retain" for json-to-xml (keeps all
  entries). Supports "use-first" (skips subsequent duplicates) and
  "reject" (throws FOJS0003). Rejects "use-last" with FOJS0005 per
  spec.

- fallback option: calls user function with \uXXXX form for characters
  not representable in XML 1.0. Validates return type (XPTY0004 for
  empty/sequence, FOTY0013 for function items).

- number-parser option (XQ4): calls custom function for number text,
  uses result's string value as <number> element content.

Also fixes:
- Options type validation: XPTY0004 for wrong type/cardinality on all
  boolean, string, and function options
- FOJS0005 for escape=true combined with fallback
- Empty string input now raises FOJS0001 (was silently returning empty)
- BOM (U+FEFF) stripped from input
- json-to-xml 2-arity options parameter now accepts empty sequence
  for XQuery 4.0 compatibility

Refactors jsonToXml from flat iterative to recursive structure for
proper per-scope duplicate key tracking and nested value skipping.
Preserves backward-compatible legacy overload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per XQuery 4.0, when a map is passed where a record type is expected,
it should be coerced rather than rejected. This commit implements:

1. SequenceType.checkType(int): Accept MAP_ITEM when expected type is
   RECORD, since maps are the runtime representation of records. This
   fixes function return type checks that rejected map(*) for record(*)
   return types.

2. SequenceType.isRecordType(): Return true when primaryType == RECORD
   without requiring a non-null RecordType object. This makes bare
   record() (no fields) a valid record type that matches empty maps.

3. SequenceType.checkType(Item): Handle bare record() — empty maps
   match, non-empty maps don't.

4. RecordTypeCheck.eval(): Transform maps during coercion instead of
   just validating. The coercion creates a new map containing only
   declared fields (dropping undeclared extras), coerces field values
   to declared types via atomic type conversion, and validates that
   required fields are present.

Fixes prod-RecordType tests: 002, 003 (map-to-record coercion),
105 (empty record matching), 015 (type conversion), 016-018 (drop
excess fields). Test 011 (key ordering) requires ordered map support
which is deferred.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add DATETIME_RECORD type constant (71) as a subtype of MAP_ITEM, registered
with the name "fn:dateTime-record". This allows the XQTS runner's assert-type
parser to resolve fn:dateTime-record type assertions instead of crashing with
"Type: fn:dateTime-record is not defined".

On branches with RECORD type support (next-v3, v2/xq4-record-types), the
parent should be changed from MAP_ITEM to RECORD during cherry-pick.

Note: FnDateTimeRecord.eval() also needs to return values with type
DATETIME_RECORD (rather than plain MAP_ITEM) for the assert-type checks
to fully pass. That change belongs with FnDateTimeRecord on v2/xq4-core-functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tomic, and placeholders

Three fixes for XQTS prod-Lookup and prod-UnaryLookup failures:

1. Lookup.java: Move type check from sequence level to per-item iteration.
   The previous check used leftSeq.getItemType() which returns the common
   supertype — for a mixed array+map sequence this is ITEM, incorrectly
   failing the check. Per spec, each item is validated individually.
   Fixes Lookup-227 ([[1,2,3], map:entry(3,5)]?*?3).

2. ArrayType.get(): Add untypedAtomic-to-integer cast for array subscripts.
   Per XQ 3.1 spec, when an element node is atomized and used as an array
   lookup key, the resulting xs:untypedAtomic must be cast to xs:integer.
   Fixes Lookup-021, Lookup-121, Lookup-230, UnaryLookup-021.

3. FunctionFactory.createFunction(): Skip the string function optimization
   (contains/starts-with/ends-with/equals rewrite to GeneralComparison)
   when any argument is a Function$Placeholder from partial function
   application. The optimization casts all params to PathExpr, which
   throws ClassCastException for placeholders. Falls through to the
   normal functionCall() path instead.
   Fixes Lookup-016, Lookup-017, UnaryLookup-016, UnaryLookup-017.

Total: recovers 9 of 27 non-passing tests (3 skipped for schemaValidation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XQuery 4.0 changes MapConstructorEntry to make the ": ExprSingle"
part optional. When the colon is absent, the expression must evaluate
to a map at runtime, whose entries are merged into the result. This
enables conditional and computed map entries:

  map { "a": 1, map {"b": 2} }
  map { "a": 1, if ($cond) then map {"b": 2} else map {} }

Parser: added MAP_MERGE imaginary token and syntactic predicate in
mapAssignment to differentiate key:value from merge entries.

Tree walker: added MAP_MERGE alternative in mapConstr rule.

MapExpr: refactored to sealed Entry interface with KeyValueEntry and
MergeEntry records. Merge entries are type-checked (XPTY0004) and
duplicate-key checked (XQDY0137) at evaluation time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…okup keys

Three parser/grammar quick fixes:

A. Restore fn:lang function name - the keyword argument session
   accidentally renamed the QName from "lang" to "language", making
   fn:lang() unresolvable (44 XQTS failures).

B. Reject extensible records per XQ4 PR2413 - record(*) and
   record(field, *) now raise XPST0003. Removed anyRecordTypeTest
   grammar rule and STAR alternatives from record type parsing
   in both ANTLR grammar and tree walker (6 XQTS tests).

D. Version-gate decimal/double literal lookup keys - $map?1.5
   is XQ4-only syntax; now raises XPST0003 in XQ 3.1 mode via
   xq4Enabled semantic predicates (5+ XQTS tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates ~75 FunctionParameterSequenceType attribute names across 64 fn:
function files to match the XQuery 4.0 Functions and Operators specification.
This enables keyword argument resolution (name := value syntax) to match
parameter names correctly.

Key patterns:
- $arg -> $value (transformation functions: abs, ceiling, floor, round, etc.)
- $arg -> $input (sequence functions: reverse, head, tail, count, etc.)
- $arg -> $node (node functions: local-name, name, namespace-uri, root, etc.)
- $arg -> $values (aggregate functions: sum, min, max, string-join)
- $collation-uri -> $collation (all collation parameters)
- $source-string -> $value (contains, starts-with, ends-with)
- $string-1/$string-2 -> $value1/$value2 (compare, codepoint-equal)
- $sequence -> $input, $function -> $action/$predicate (HOF functions)
- $date/$time/$date-time/$duration -> $value (date/time extraction functions)
- Various other renames per XQ4 F&O spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t XQ4 lookaround

Add pre-validation of regex patterns in fn:matches and fn:replace to reject
constructs that are not part of the XPath regular expression specification
(F&O 3.1, Section 5.6.1) but that Saxon's XP30 mode silently accepts.

Rejected constructs include:
- \x, \u hex/unicode escapes (not in XPath regex)
- \A, \Z, \z Java-specific anchors
- \b, \B word boundary assertions
- \a, \e, \f, \v special character escapes
- \Q, \E literal quoting
- \G, \k, \g named/numbered back-references
- (?=...) (?!...) (?<=...) (?<!...) Java-style lookaround
- (?>...) atomic groups
- (?i:...) (?m:...) (?s:...) (?-i:...) inline flag groups
- *+ ++ ?+ possessive quantifiers

Also adds support for XPath 4.0 named lookaround syntax by translating
(*positive_lookahead:...) etc. to Java regex (?=...) equivalents.

Expected XQTS impact: ~137 of 173 fn-matches.re failures fixed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix multiple error code and validation issues in type casting:

- CastExpression/CastableExpression: Move xs:anySimpleType from XPST0080 to
  XPST0051 check; add xs:anyType to XPST0051. Per spec, XPST0080 is only
  for xs:NOTATION and xs:anyAtomicType.
- CastExpression: Add string-intermediary casting path for string-derived
  types (xs:language, xs:Name, xs:NMTOKEN, etc.) so non-string source types
  cast via xs:string first, producing correct FORG0001 on invalid values.
- CastExpression: Allow xs:untypedAtomic to cast to xs:QName.
- DynamicTypeCheck: Raise XPTY0117 (new) when xs:untypedAtomic is coerced
  to namespace-sensitive types (xs:QName, xs:NOTATION) in function arguments.
- ErrorCodes: Add XPTY0117 for namespace-sensitive type coercion errors.
- AbstractDateTimeValue: Reject seconds >= 60, hour 24 with fractional
  seconds, and leading zeros in year (e.g., "02004" is invalid).
- Date/time value constructors (TimeValue, DateTimeValue, DateValue,
  GYearValue, GMonthValue, GDayValue, GYearMonthValue, GMonthDayValue):
  Use ErrorCodes.FORG0001 instead of bare exerr:ERROR.
- DurationValue/DayTimeDurationValue: Validate decimal point has digits on
  both sides (reject "PT.5S" and "PT30.S").
- DoubleValue: Use BigDecimal for double-to-integer conversion to prevent
  silent long overflow (fixes xs:integer(99e100)).
- AnyURIValue: Relax URI validation to accept valid URI references that
  java.net.URI rejects (e.g., ":/", "%gg", "%GF").
- StringValue: Use XMLNames.isName() for xs:Name validation instead of
  QName.isQName(), so colons are accepted (e.g., "::::").
- BinaryValue: Use ErrorCodes.FORG0006 for effectiveBooleanValue() instead
  of bare error string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r keyword arguments

XQuery 4.0 keyword argument resolution matches parameter names from the
function signature. Many built-in function parameters used legacy names
(e.g., fn:floor($number) instead of fn:floor($value)) that don't match
the XQ4 spec, causing keyword argument calls like fn:floor(value := ?)
to fail with arity mismatch errors.

Updated parameter names in 33 files across fn:, math:, and array: modules
to match the XQ4 spec. This enables keyword argument syntax for ~56 of the
65 failing misc-BuiltInKeywords XQTS tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lation construction

Add fn:collation($options as map(*)) overload that constructs a UCA
collation URI from a map of collation options as defined in the XQ4 spec.

The function:
- Accepts UCA option keys (strength, lang, fallback, case-first, etc.)
- Converts hyphenated map keys to camelCase URI parameters
- Converts boolean values to "yes"/"no"
- Validates option values when fallback=false, raising FOCH0002 for errors
- Returns a UCA collation URI usable with fn:compare, fn:sort, etc.

The existing fn:collation() 0-arg and fn:collation(string) 1-arg variants
are preserved. The 1-arg signature now accepts item() to dispatch between
string (collation URI check) and map (UCA construction) at runtime.

Expected to fix ~71 of 73 failing fn-collation XQTS tests (all error-only
and error-or-value tests pass; value-only tests that require actual UCA
comparison behavior depend on the collation infrastructure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ement-to-map, regex, deep-equal

- fn:codepoints-to-string: Accept C0 control characters (U+0001-U+001F)
  per XML 1.1 / XQ 4.0 while still rejecting Unicode noncharacters.
  Fixes 87+ fn-graphemes tests that use codepoint 1.

- fn:build-dateTime: Add 1-argument map-based overload (XQ 4.0) that
  constructs xs:dateTime, xs:date, xs:time, or Gregorian types based
  on which record fields are present. Includes FODT0005/FODT0006
  validation. (+45 tests)

- fn:element-to-map: Fix list layout to group same-named siblings under
  shared key, add grouped content for mixed-name siblings, preserve
  whitespace-only text nodes, use xml: prefix for XML namespace
  attributes. (+38 tests)

- Regex \b/\B word boundaries: Pre-process XQ 4.0 word boundary escapes
  before Saxon translation (which only supports XP 3.0 regex). Falls
  back to Java regex for patterns containing \b/\B. (+8 tests)

- fn:deep-equal: Add "unordered" option key support (inverse of "ordered").

- ErrorCodes: Add FODT0005 (missing date/time component) and FODT0006
  (invalid date/time component value).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…matching, -or-self axes

JNode.java: Add getFollowing() and getPreceding() methods implementing
  the full following/preceding axes (siblings + descendants, recursive).

LocationStep:
- Add following, preceding, following-or-self, preceding-or-self,
  following-sibling-or-self, preceding-sibling-or-self axis handling
  for JNodes (all 15 XPath axes now supported).
- Fix matchesJNode() wildcard: TypeTest(ELEMENT) from child::* was not
  matching JNodes. Now treats ELEMENT as wildcard in JNode context.
- Fix NameTest handling: support named key lookup (e.g., $root/name)
  and proper wildcard detection for JNodes.

JNodeTest: Add xpathChildWildcard test for child::* on JNodes.

+6 AxisStep.J tests (JAxes-001, JAxes-021 histogram tests now pass),
zero regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…jnode() parsing

Fix three issues preventing JNode path expression and axis step tests
from passing:

1. Static type check for JSON nodes: Step.returnsType() returns NODE
   for all non-self axes, causing function parameter checks to fail when
   json-node() type is required. Relax Function.checkArgumentType() to
   allow NODE→JSON_NODE in XQ4 (defer to dynamic check). Add JNode
   recognition in DynamicTypeCheck for runtime validation. This unblocks
   all 30 histogram tests in prod-AxisStep.J.

2. fn:get() sequence support: Change cardinality from EXACTLY_ONE to
   ZERO_OR_MORE and iterate over key sequences. Tests call get((1,3)),
   get(()), etc. which requires sequence arguments.

3. jnode(name, type) parsing: Replace crude ~(RPAREN)* grammar rule with
   paren-balanced skipper (jnodeTestArgs) that handles nested parens in
   expressions like jnode(*, record(A as xs:double, B as xs:string)).

4. Add JSON_NODE to Type.isNavigable() so JNode types are recognized as
   navigable in path expressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ility, lazy module registration

Fix several XQuery module import conformance issues detected by the XQTS
prod-ModuleImport test set, improving pass rate from 55/117 to 101/131.

Annotation validation:
- Add XQST0106 error for duplicate/conflicting %public/%private annotations
  on function declarations
- Add XQST0116 error for duplicate/conflicting %public/%private annotations
  on variable declarations
- Handle both the XPath Functions namespace and the XQuery annotation
  namespace (http://www.w3.org/2012/xquery) for visibility annotations

Private visibility enforcement:
- Store %private flag on VariableDeclaration
- Check %private visibility in XQueryContext.resolveVariable() to prevent
  access to private variables from outside their declaring module

Module resolution:
- Add addModuleLocationHint() to XQueryContext for lazy module location
  registration without eager loading
- Override getModuleLocation() in ModuleContext to propagate dynamic module
  locations from parent contexts
- Normalize whitespace in namespace URIs in importModule()
- Add recursion guard in ExternalModuleImpl.resolveVariable() to prevent
  stack overflow from circular variable dependencies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nformance

Fixes 59 XQTS failures (0% → 100% pass rate for fn-element-to-map suite):

- Implement "default" name-format (spec default, not "eqname"):
  child elements in same namespace as parent use local name,
  different namespace uses Q{ns}local, no-namespace child of
  namespaced parent uses Q{}local. Special case for xml: namespace
  attributes.

- Fix layout classification: whitespace-only text between elements
  is ignored for layout detection but preserved as content when the
  element has only whitespace text and no child elements.

- Implement sequence layout for non-unique child names (was
  incorrectly using record layout which collapsed duplicates).

- Fix list/list-plus layout: list drops child element name and
  returns array directly; list-plus uses child name as map key.

- Add plan support: explicit layout directives (empty, empty-plus,
  simple, simple-plus, list, list-plus, record, sequence, mixed,
  xml, error, deep-skip), fallback via "*" key, type coercion
  (numeric/boolean), FOJS0008 error for layout mismatches.

- Add XML serialization layout for plan-based conversion.

- Add option validation: name-format, attribute-marker type checks
  with XPTY0004 errors for invalid values.

- Add xsi:type coercion for schema-typed simple content.

- Fix fn:element-to-map-plan corpus analysis: properly merge
  multiple instances, detect list patterns across empty and
  non-empty instances, generate type annotations for numeric
  content.

- Add FOJS0008 error code to ErrorCodes.java.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…checking

fn:unparsed-text improvements:
- Change $encoding parameter from xs:string to xs:string? to accept empty sequence
- Fix error code mapping: FODC0005 → FOUT1170 for URI syntax errors
- Add URI-only dynamic text resource lookup for encoding-agnostic resolution
  (fixes UTF-16 and ISO-8859-1 resources when no encoding is specified)
- Add readLines support for dynamic text resources (was missing)
- Add XML character validation (FOUT1190) for non-XML characters
- Fix unparsed-text-available to return false (not empty sequence) for empty href

Function type checking (SequenceType):
- Add functionParamTypes and functionReturnType fields to SequenceType
- Wire up ANTLR tree walker to populate function type info (resolves TODO)
- Add return type covariance checking for function instance-of operations

XQTS fn-unparsed-text: 50 → 32 failures (18 tests fixed, 36% improvement)
Subtyping fixes require next-v3 integration branch for proper testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, promotions

- Add XPST0003 checks for reserved function names in NamedFunctionReference
  and FunctionFactory, fixing ~16 prod-NamedFunctionRef and ~7 prod-FunctionCall
  tests that incorrectly returned XPST0017 or succeeded when they should fail

- Fix context item passing for wrapped internal functions (FunctionFactory.wrap).
  UserDefinedFunction now preserves the evaluation context for wrapper functions,
  fixing ~15 tests where context-dependent functions like fn:string#0,
  fn:node-name#0, fn:id#1, fn:idref#1 lost the focus when called via
  function references

- Add binary type promotion (xs:base64Binary ↔ xs:hexBinary) in
  GeneralComparison and DynamicTypeCheck per XQuery 4.0 spec, fixing 4
  function-call-promotion tests

- Register 2-arity fn:element-with-id signature (the implementation already
  handled 2 args but the signature was missing), fixing 2 tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…codes

Two XQTS compliance fixes:

1. OrderedDurationValue: comparing xs:yearMonthDuration with
   xs:dayTimeDuration now correctly throws XPTY0004 for all operators
   (eq/ne/lt/gt/le/ge), not just ordering operators. Per XPath spec,
   these duration subtypes are not comparable. (~32 test failures)

2. FunSerialize: preserve specific serialization error codes
   (SERE0020, SERE0022, SERE0023, SEPM0017) from exception messages
   instead of collapsing them all to SENR0001. The JSON serializer
   already throws the correct codes, but FunSerialize was catching
   and re-wrapping them. (~18 test failures)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…quenceType methods, FnFormatDates params

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a per-expression optimization hook modelled on BaseX's
Expr.optimize(): each Expression subclass may return a replacement
expression (constant fold, branch select, etc.) instead of just being
wrapped in a pragma by the central Optimizer visitor.

Why: eXist's optimizer is a single visitor pass that can only wrap
predicates in pragmas — it cannot replace expressions, cannot compose
optimizations, and must understand every expression type centrally.
This is recommendation L1 from query-optimizer-audit.md and a precursor
to FLWOR loop-invariant hoisting and other per-class optimizations.

Design (joe-vault/Claude/exist/optimizer-expression-optimize-design.md):

- Expression.optimize(CompileContext) is a default method on the
  interface, returning `this`. Zero source impact on the 370+ existing
  Expression subclasses (38 direct AbstractExpression, 213 functions,
  122 extension functions, plus DebuggableExpression).

- CompileContext carries the XQueryContext, an in-memory rewrite log,
  and the changed flag. replaceWith(orig, new, reason) is the preferred
  return path. preEval(expr) folds a no-dependency expression to a
  literal.

- XQueryContext.analyzeAndOptimizeIfModulesChanged runs the new pass
  before the legacy visitor: replacements happen first, the visitor
  then operates on the reduced tree. The CompileContext is stored on
  the context (getLastCompileContext()) so diagnostics and tests can
  see what fired. The legacy visitor stays in place for the migration
  period.

- Recursion enablers: PathExpr.optimize walks `steps`,
  BindingExpression.optimize walks inputSequence + super(returnExpr),
  AbstractFLWORClause.optimize walks returnExpr. These let
  optimize() reach into the most common containers — without them,
  only the top-level expression's optimize() would fire.

This commit is framework-only; no Expression subclass yet returns a
non-trivial replacement. Behaviour is unchanged from the baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LetExpr

Demonstrates the per-expression optimize() framework with three
expression types chosen for breadth of cases:

- ConditionalExpression: constant-condition folding. When testExpr
  reduces to a LiteralValue with a statically-known EBV, return the
  chosen branch directly (e.g. `if (1=1) then "a" else "b"` becomes
  `"a"`). Recurses into testExpr/thenExpr/elseExpr.

- GeneralComparison: pre-evaluates literal-vs-literal comparisons via
  cc.preEval (e.g. `1 = 1` becomes `true`). Calls super.optimize() to
  recurse into the left/right operands stored in PathExpr.steps.

- LetExpr: drops the let-binding when the body is a literal value (so
  the variable is by definition unused) and the input is also a
  LiteralValue (no side effects to preserve). Conservatively guards
  against score bindings (XQFT 3.0 §2.3), clause chains (would require
  previousClause repair), and FLWOR-clause return bodies.

Tests (ExpressionOptimizeTest) parse a query into the internal
expression tree, run analyzeAndOptimizeIfModulesChanged, and assert
both the structural rewrite and the CompileContext log entry. Follows
the pattern from CountExpressionTest — no embedded server needed.

XQuery3Tests on this branch shows 7 failures + 1 error, identical to
the baseline on next-v3 (verified by stashing changes and rebuilding):
the failures are pre-existing and unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-ForExpr.optimize() pass that detects when an inner for-clause's
input expression is loop-invariant relative to enclosing FLWOR loops and
rewrites it as a reference to a synthesised let inserted upstream. The
expression then evaluates once instead of once per outer iteration.

Approach: rewrite-into-let, not eval-time caching. A previous attempt at
caching (BindingExpression flag populated on first eval) failed with 63
XQuery3Tests regressions due to per-Expression-instance cache lifetime
bugs (function bodies / closures / parameters / unsafe visitor recursion).
The rewrite-into-let model sidesteps all five pitfalls because the
hoisted let is just a plain FLWOR clause evaluated by normal semantics.

CompileContext additions:
- enterFlworChain / exitFlworChain push and pop scope state on a Deque;
- addVisibleFlworVar routes vars into either letPrefixVars (safe to
  reference from a hoist) or loopBodyVars (unsafe — block hoisting),
  partitioned by whether the chain has yet seen a for-clause;
- recordForClause marks the first FOR per scope as the hoist insertion
  point;
- addPendingHoistToOutermost queues the hoist on the outermost scope
  that actually has a FOR (a let-only outer chain has no loop to lift
  over and nothing to insert before);
- applyHoistsAndExitChain wraps the chain head in synthesised lets when
  the FOR is the head, otherwise splices them mid-chain between the
  predecessor clause and the FOR.

ForExpr.optimize:
- pushes scope when chain head, recurses input first (so inner FLWORs
  in the input register hoists targeting US), runs the hoist check on
  our input, records this as the firstForClause, then makes our
  binding visible and recurses the rest of the chain. The inner walker
  (RefCollector) handles getSubExpression-gap classes explicitly
  (BindingExpression / FilteredExpression / LocationStep / WhereClause)
  and aborts conservatively on unrecognised shapes.

LetExpr.optimize:
- mirrors the chain-head detection so let-headed chains also track scope;
- registers its var visibility AFTER the input is recursed (per XQuery
  scoping: the let binding is not in scope for its own initializer).

AbstractFLWORClause.optimize:
- registers each clause's tuple-stream variables on the active scope so
  group-by keys, count vars, etc. are tracked alongside for/let vars
  (otherwise an inner hoist that referenced one would falsely classify
  loop-invariant).

ElementConstructor.optimize:
- recurses into the content PathExpr and the qname expression. Without
  this the optimize() pass dies at every wrapping element (every XMark
  query wraps results in <XMark-result-Q*>{ ... }</XMark-result-Q*>).

DebuggableExpression.optimize:
- recurses into the wrapped expression and captures any replacement.
  The FLWOR parser inserts a DebuggableExpression around every return
  body; without this, optimize is silently disabled for FLWOR returns.

XMark heavy-tier (Q8/Q9/Q11/Q12), QT4 app-XMark, factor 0.01:
  L1 framework alone: 9.375s
  L1 + this hoist: 6.532s (-30%)

Per-query (this run):
  Q8:  2.099 -> 1.566s
  Q9:  2.131 -> 1.645s
  Q11: 2.579 -> 1.646s
  Q12: 2.566 -> 1.675s

Verification:
- xquery.xquery3.XQuery3Tests: 1227 tests, 7 failures + 1 error
  — exact match with branch baseline; zero new regressions.
- OptimizerTest: 6/6 pass.
- ExpressionOptimizeTest: 12/12 pass (4 new hoisting tests added).
- XMark app-XMark: 21/21 pass (functional correctness preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detects the {for $i in <invariant> where $i/key = $outer/key return ...}
pattern in ForExpr.optimize() and rewrites it into a HashJoinForExpr that
builds a hash on the (already-hoisted) input once per query and probes per
outer iteration, replacing the existing nested-scan WhereClause.preEval
fast-track with O(N+M) build/probe.

Cache lifetime is keyed by the input Sequence reference, mirroring BaseX
CmpHashG/CmpCache. A fresh function invocation produces a new input
Sequence (fresh let-bindings) and rebuilds the hash, sidestepping the
cross-call cache lifetime bug that broke the earlier eval-time-cache
attempt (per joe-vault/Claude/exist/query-optimizer-overhaul.md).

Detection in ForExpr.tryHashJoinRewrite gates conservatively:
- relation must be Comparison.EQ (no truncation/collation)
- exactly one side references the inner FOR's variable; the probe side
  must be statically free of inner-var references (verified by RefCheck
  walker that mirrors RefCollector's class-shape coverage)
- WhereClause.returnExpr must NOT be a FLWORClause (no order-by /
  group-by / chained for-let — match-iteration would not preserve their
  semantics)
- no positional/score variables, no `allowing empty`
- DebuggableExpression / single-step PathExpr wrappers are peeled before
  the comparison check (the parser emits these for debug fidelity)

HashJoinForExpr extends ForExpr (preserves visitor identity, scope
management, and previousClause wiring) and overrides eval() with the
hash-build-and-probe path. resetState clears the cache.

Optional gating via -Dexist.optimizer.hashjoin=false (or
ForExpr.setHashJoinEnabledForTest for unit tests). Default is true; the
flag exists as an emergency switch if a workload regresses.

XMark factor 0.01, app-XMark, ANTLR parser, 4 interleaved runs ON vs OFF:

  Heavy tier (Q8/Q9/Q11/Q12) median:  4.47s ON  /  6.42s OFF  (~30% red.)
  Q8 specifically:                    0.34s ON  /  1.57s OFF  (~4.6x)
  Q9:                                 1.19s ON  /  1.68s OFF
  Q11:                                1.43s ON  /  1.66s OFF
  Q12:                                1.48s ON  /  1.66s OFF

  app-XMark (21 tests) total: 8.6s default-on / 21/21 pass

Verification:
- xquery.xquery3.XQuery3Tests: 1227 tests, 7 failures + 1 error — exact
  match with branch baseline (50b512b), zero new regressions.
- OptimizerTest: 6/6 pass.
- XPathQueryTest: 150/150 pass (4 skipped, pre-existing).
- ExpressionOptimizeTest: 18/18 pass (6 new hash-join structural tests:
  fires for `=`, skipped for `<`, skipped when no inner-var ref, skipped
  when both sides reference inner var, fires at top level, skipped when
  body is a FLWOR clause).
- XMark app-XMark: 21/21 pass (functional correctness preserved).

Note: 2 RD-parser-only failures (XMark-Q4, XMark-All) on `$pr1 << $pr2`
inside a quantified expression are pre-existing and unrelated to this
change — same failures appear with hash-join off and with the parent
50b512b commit alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A query of the form

    let $v := <persistent path> return $v[Optimizable-pred]

ran ~167x slower than the equivalent direct form. The legacy Optimizer
wraps the FilteredExpression in the (#exist:optimize#) pragma in both
cases, but at runtime BasicExpressionVisitor.findFirstStep cannot find
a top-level LocationStep when the FE source is a VariableReference, so
Optimize.eval records contextStep == null and falls back to
innerExpr.eval(result, ...) -- which invokes FilteredExpression.eval,
which calls VariableReference.eval, which reads the bound value off
the local variable stack and ignores contextSequence. The pre-selected
NodeSet is therefore computed and then thrown away, and the predicate
runs once per node in the full input.

This commit adds LetInliner.tryInline(let, cc) called from
LetExpr.optimize(cc). When all six soundness gates hold:

  1. variable name present and not a score binding (XQFT 3.0 §2.3);
  2. standalone let, not a chain link;
  3. no declared sequence type;
  4. node-typed input;
  5. input contains a non-wildcard LocationStep;
  6. body unwraps to a FilteredExpression whose source is the variable,
     with exactly one Optimizable predicate, and the variable does not
     appear anywhere else in the body,

the predicate is appended to the input's last LocationStep and the
LetExpr is replaced by the input path. The legacy Optimizer pass then
sees the same shape it knows how to wrap from the direct form, attaches
the pragma to the LocationStep, and routes through the index pre-select.

Gate 6 mirrors what was noted in a 2016 comment on the issue thread:
inlining is only desirable when the inlined form would expose an
Optimizable to visitLocationStep.

Closes eXist-db#873

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six tests covering the LetInliner rewrite:

  - issue873_indirectQueryReturnsSameNodes -- direct vs indirect must
    return the same nodes (correctness check).
  - issue873_inlineRewriteLogged -- the optimizer rewrite log must
    contain an "inline let $a" entry for the indirect query.
  - inline_doesNotFireWhen_letReferencedTwice -- gate (count != 1).
  - inline_doesNotFireWhen_letBoundToCount -- gate (body is not a
    FilteredExpression).
  - inline_doesNotFireWhen_letIsTyped -- gate (sequenceType present).
  - issue873_indirectQueryUnderLoosePerfBound -- a 20x ceiling +
    500ms slack, loose enough to avoid CI flakiness while still
    catching the original 167x regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner May 7, 2026 01:55
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 7, 2026

Closing -- the base branch (optimizer/expression-optimize-method) does not exist on eXist-db/exist, so the diff incorrectly includes the 256 framework commits that this fix is stacked on. Will reopen once the foundation branch is upstream, or open a within-fork PR for review in the meantime.

@joewiz joewiz closed this May 7, 2026
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 7, 2026

Work is now in joewiz#13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

optimizer regression for indirect queries on Lucene full-text and range indexes in eXist-2.2 and eXist-3.0.RC1

1 participant