[feature] Inline let-bound persistent paths for index pre-select by joewiz · Pull Request #6306 · eXist-db/exist

joewiz · 2026-05-07T01:55:11Z

Summary

Closes #873.

A let $v := <persistent path> return $v[Optimizable-pred] query was running ~167x slower than the equivalent direct form <persistent path>[Optimizable-pred]. This PR adds a narrow rewrite at the LetExpr.optimize(cc) stage that converts the indirect form into the direct form when six soundness gates hold, so the legacy Optimizer pass attaches the (#exist:optimize#) pragma to the resulting LocationStep and routes through the lucene/range pre-select.

Base: this PR targets develop but is built on top of optimizer/expression-optimize-method, the orphan-rescue branch carrying the four foundational Expression.optimize(CompileContext) framework commits (e6993cc020, ef07bb8cbc, 50b512b611, c922cefc2c). Those need to land first; the rewrite uses cc.replaceWith(...) and the per-expression optimize pass introduced there.

Diagnosis

For

let $a := collection('/db/system/test')//LINE
return $a[ft:query(., 'Denmark')]

Optimizer.visitFilteredExpr does wrap the FilteredExpression in the optimize pragma. At runtime Optimize.eval runs the lucene pre-select and obtains a small result NodeSet. But because the FE source is a VariableReference rather than a LocationStep, before() records contextStep == null (BasicExpressionVisitor.findFirstStep finds no LocationStep at the top level). With contextStep == null, Optimize.eval falls through to innerExpr.eval(result, null) -- but FilteredExpression.eval then calls expression.eval(contextSequence, ...) where expression is the VariableReference, and VariableReference.eval reads the bound value off the local variable stack and ignores contextSequence. So seq becomes the full //LINE NodeSet and ft:query runs once per LINE; the pre-selected result is computed and discarded.

The same issue affects range and any other Optimizable-bearing predicate; the predicate type is incidental.

What changed

File	Purpose
`exist-core/src/main/java/org/exist/xquery/LetInliner.java`	New helper. `tryInline(LetExpr, CompileContext)` checks the six gates; on success, attaches the FE's predicate to the input path's last LocationStep and returns the input path as the LetExpr's replacement (logged via `cc.replaceWith`).
`exist-core/src/main/java/org/exist/xquery/LetExpr.java`	New `isScoreBinding()` accessor; calls `LetInliner.tryInline(this, cc)` after the existing literal-drop gate.
`extensions/indexes/lucene/src/test/java/org/exist/indexing/lucene/LetInliningRegressionTest.java`	Six JUnit tests: one correctness pair, one optimizer-log inspection of the inline rewrite firing, three negative-gate tests (used twice / bound to count() / typed declaration), one loose perf bound (20x ceiling + 500ms slack -- catches a 167x regression without CI flakiness).

The six gates (`LetInliner.tryInline`)

Variable name present and not a score binding -- XQFT 3.0 §2.3 score bindings synthesize a double rather than the input value; inlining would change semantics.
Standalone let, not a chain link -- getPreviousClause() == null and the body is not a FLWORClause. Limits v1 to standalone lets so we don't have to repair previousClause pointers.
No declared type on the binding -- typed declarations impose a runtime cardinality+type check on $v's bound value; inlining bypasses it. v2 may relax this when the inlined static type still satisfies the declaration.
Input is node-typed -- strings, atomics, etc. don't gain from a downstream index pre-select.
Input contains a non-wildcard LocationStep -- the predicate has to attach somewhere indexable.
Body is a FilteredExpression (or a length-1 PathExpr / DebuggableExpression wrapping one) whose source is the bound variable, with exactly one Optimizable predicate, and the variable does not appear anywhere else in the body -- mirrors what Optimizer.visitFilteredExpr's instanceof LocationStep simplification expects, so the post-rewrite shape is exactly what the legacy pass already knows how to wrap. The "exactly one predicate" + "no other refs" guards prevent positional-predicate or substituted-out-of-scope correctness regressions.

The "inlining is not always desirable, in particular if there's no index defined" caveat noted in a 2016 comment on the issue is exactly gate 6: we only inline when the inlined form would expose an Optimizable to visitLocationStep.

Why option (a) inline at AST level, vs option (b) special-case `Optimize.eval`

(a) generalises -- the same rewrite catches any Optimizable predicate (ft:query, GeneralComparison, FunMatches, range Lookup, ...) without per-predicate-type repair.
(a) addresses the 2016 caveat (gate 6 is the if-index-defined check).
(a) fits the framework already on this branch; cc.replaceWith handles logging and the re-analyze flag.
(a) mirrors BaseX's Let.inlineExpr / GFLWOR.inlineForLet pattern.

Option (b) is kept as a follow-up for shapes (a) doesn't catch (e.g., let bindings to function-call results that don't expose a LocationStep in the input).

Test plan

mvn test -pl extensions/indexes/lucene -- new LetInliningRegressionTest passes (6/6); existing OptimizerTest, LuceneIndexTest pass; 5 pre-existing ft-facets failures unchanged (verified by stash-and-rerun).
mvn test -pl extensions/indexes/range -- 425/425 pass + 3 skipped, no new failures.
mvn test -pl exist-core -- baseline 34 fail / 20 err vs with-fix 35 fail / 20 err (+1 = known flaky 503 HTTP test); unique failure set is byte-identical.
Codacy PMD on changed files -- clean.
(Deferred) JMH LetIndirectionBenchmark -- depends on the exist-indexes-jmh module which lives on a separate branch (feature/index-jmh-benchmarks); will land as a follow-up once that module reaches develop.

XQTS scores (QT4, 3.1, FTTS) should be unchanged because the rewrite preserves correctness; will flag if any score moves.

New function implementations in the fn: namespace: Sequence functions: fn:characters, fn:identity, fn:void, fn:foot, fn:trunk, fn:slice, fn:items-at, fn:replicate, fn:insert-separator, fn:all-equal, fn:all-different, fn:duplicate-values, fn:index-where, fn:take-while, fn:distinct-ordered-nodes, fn:siblings Higher-order functions: fn:every, fn:some (function form), fn:highest, fn:lowest, fn:sort-by, fn:sort-with, fn:partition, fn:scan-left, fn:scan-right, fn:subsequence-where, fn:transitive-closure, fn:partial-apply, fn:op String/URI functions: fn:char, fn:graphemes, fn:decode-from-uri, fn:parse-uri, fn:build-uri, fn:expanded-QName, fn:parse-QName, fn:parse-integer, fn:divide-decimals Date/Time functions: fn:civil-timezone, fn:build-dateTime, fn:parts-of-dateTime, fn:unix-dateTime, fn:seconds Type functions: fn:schema-type, fn:atomic-type-annotation, fn:node-type-annotation, fn:element-to-map, fn:element-to-map-plan, fn:type-of, fn:is-NaN Context functions: fn:get, fn:collation, fn:collation-available, fn:message Parsing functions: fn:parse-html (Validator.nu HTML5 parser), fn:invisible-xml (Markup Blitz iXML parser), fn:parse-csv, fn:csv, fn:html-doc, fn:unparsed-binary Data functions: fn:hash, fn:function-annotations, fn:function-identity, fn:in-scope-namespaces Also: DeepEqualOptions class for fn:deep-equal options map support, FnModule registrations for all new functions. Spec: QT4 XQuery 4.0 §14 (Functions and Operators) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Standardize error codes across casting and type checking: - Use XPTY0004 consistently for type errors (was mixed with FORG0001) - Use FORG0001 for invalid cast values (not type mismatches) - Add XPST0080 for xs:anyType in cast/castable (XQ4 spec) - Add XQ4-specific error codes for new expression types - Fix DynamicCardinalityCheck, DynamicTypeCheck, TreatAsExpression to use correct W3C error codes - Align all value type convertTo() methods with spec error codes This fixes ~30 XQTS test failures caused by wrong error codes. Spec: W3C XQuery 3.1 §B.1 (Error Codes), QT4 XQuery 4.0 Appendix B (Error Codes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Array module (8 new functions): - array:build, array:index-of, array:index-where, array:of-members, array:split, array:sort-by, array:sort-with, array:slice - Plus: array:get#3 with default value Map module (5 new functions): - map:build, map:items, map:entries, map:filter, map:keys-where - Plus: map:get#3 with default value, map:empty Math module (4 new functions): - math:cosh, math:sinh, math:tanh, math:e - Plus: math:pow edge case fixes Spec: QT4 XQuery 4.0 §17 (Array Module), QT4 XQuery 4.0 §16 (Map Module), QT4 XQuery 4.0 §18 (Math Module) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Comprehensive XQSuite test module for XQ4 syntax features: - Pipeline operator: basic chaining, nested pipelines, with functions - Focus functions: fn { . + 1 }, context item binding - Keyword arguments: named parameter passing, mixed positional/named - String templates: interpolation, nested expressions, escaping - Otherwise operator: empty fallback, non-empty passthrough - Braced if: if (cond) { expr } without else - Try/finally: cleanup execution, error propagation - For member: array member iteration - While clause: conditional FLWOR iteration - Default parameter values: function declarations with defaults - QName literals: #name symbolic references - Hex/binary integer literals: 0xFF, 0b1010 - Numeric underscore separators: 1_000_000 - Version gating: features require xquery version "4.0" XQTS: QT4 parser-dependent test sets (1898/2163, 87.7%) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add parser support for the XQuery 3.1 `declare decimal-format` and `declare default decimal-format` prolog declarations (spec section 4.10), enabling users to customize number formatting via fn:format-number. The runtime infrastructure (DecimalFormat class, XQueryContext storage, FnFormatNumbers 3-arg support) was already in place — this adds the missing parser recognition and tree walker processing. Changes: - XQuery.g: Add DECIMAL_FORMAT_DECL/DEF_DECIMAL_FORMAT_DECL tokens, grammar rules for named and default forms, property keywords - XQueryTree.g: Walk AST, validate properties (single-char, zero-digit, distinctness), register formats in XQueryContext - ErrorCodes.java: Add XQST0097 (duplicate) and XQST0098 (invalid) - XQueryContext.java: Add setDefaultStaticDecimalFormat() convenience - format-numbers.xql: Add tests for named/default formats, custom NaN/infinity, and error cases Closes eXist-db#56 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Build configuration: - exist-parent/pom.xml: Add markup-blitz 1.10 (fn:invisible-xml), htmlparser 1.4.16 (fn:parse-html via Validator.nu) - exist-core/pom.xml: Add markup-blitz and htmlparser dependencies - .gitignore: Ignore iXML grammar cache files Format improvements: - FnFormatDates: comprehensive format-date/format-time improvements - FnFormatNumbers: map overload, char:rendition pattern, negative exponent zero-padding fix Tests: - fnXQuery40.xql: XQSuite tests for XQ4 functions - fnInvisibleXml.xqm: fn:invisible-xml test suite - format-number-map.xql: fn:format-number map overload tests - deep-equal-options-test.xq: fn:deep-equal options map tests - Updated: fnLanguage.xqm, json-to-xml.xql, replace.xqm Spec: QT4 XQuery 4.0 §14 (Functions and Operators) XQTS: 732/861 (85.0%) for XQ4-specific test sets Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add tree walker version checks for all XQ4-only constructs: when staticContext.getXQueryVersion() < 40, throw XPST0003 with a descriptive message. This ensures modules declaring xquery version "3.1" cannot use XQ4 syntax even if the parser somehow accepts it. Gated constructs: otherwise, pipeline (->), mapping arrow (=>!), ternary conditional (?? !!), keyword arguments, focus functions, string templates, while clause, default parameters, for-member, method call (=>?). Also add system property exist.xquery4.enabled (default true) to allow disabling XQ4 support entirely. When disabled, xquery version "4.0" declarations throw XPST0003. Addresses reviewer feedback from line-o on PR eXist-db#6139. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dd Javadoc Rename the three decimal-format validation helper methods in XQueryTree.g with a `df` prefix to clarify their scope: - requireSingleChar → dfRequireSingleChar - validateZeroDigit → dfValidateZeroDigit - validateDistinctPictureChars → dfValidateDistinctPictureChars Add Javadoc comments on DecimalFormat.UNNAMED and UNNAMED_DECIMAL_FORMAT explaining the XPath 3.1 spec origin of the "unnamed" terminology. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…d fn:tokenize Without the ! flag, empty-matching patterns raise FORX0003 in both XQ 3.1 and XQ 4.0 mode. With the ! flag in XQ 4.0, fn:replace uses the Java regex fallback and fn:tokenize tokenizes between each character. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e List Adds the core XQUF expression classes for insert, delete, replace, rename, and transform (copy-modify-return) operations, plus the Pending Update List (PUL) merge-and-apply infrastructure. The PUL implements the W3C XQUF 3.0 update primitive model with five phases: insert, replace, rename, delete, and put. Update primitives are collected during expression evaluation and applied atomically at snapshot boundaries. Expression classes: - XQUFInsertExpr: insert node (before/after/into/as first/as last) - XQUFDeleteExpr: delete node - XQUFReplaceNodeExpr: replace node - XQUFReplaceValueExpr: replace value of node - XQUFRenameExpr: rename node - XQUFTransformExpr: copy-modify-return (in-memory deep copy + PUL) Includes namespace conflict detection (XUDY0021/0023/0024) inspired by BaseX's NamePool approach. Spec: W3C XQuery Update Facility 3.0, Sections 2.1-2.5 XQTS: 684/684 non-schema tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implements XUST0001 (non-updating expression in updating context) and XUST0002 (updating expression in non-updating context) static type checking across the expression hierarchy. Adds Expression.isVacuous() method for recursive vacuousness detection, which allows expressions like empty sequences and conditionals with all-vacuous branches to pass XUST checks. This is required because vacuous expressions are compatible with both updating and non-updating contexts per the W3C XQUF 3.0 spec. Key changes: - Expression.java: isVacuous(), isUpdating(), analyze() flags - PathExpr: context step propagation fix (i>=1 for XQUF) - TypeswitchExpression, SwitchExpression: branch-level XUST checks - ConditionalExpression: then/else branch XUST checks - ErrorCodes: XUST0001, XUST0002, XUDY0009, XUDY0014-0024, XUTY0004-0013, XUTY0022 - FunctionSignature: updating annotation support - FunctionCall: updating function call propagation Spec: W3C XQuery Update Facility 3.0, Section 2.6 (Static Typing) XQTS: 684/684 non-schema tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add full text grammar productions to XQuery.g parser and XQueryTree.g tree walker for the W3C XQuery and XPath Full Text 3.0 specification. This establishes the parsing foundation for ftcontains expressions, FTSelection operators (FTOr, FTAnd, FTMildNot, FTUnaryNot, FTWords), and positional filters (FTOrder, FTWindow, FTDistance, FTScope, FTContent, FTTimes). The AST expression classes in org.exist.xquery.ft model the full text selection grammar as a tree of FTAbstractExpr nodes. Each node corresponds to a production in the XQFT grammar and carries the evaluation semantics defined in the spec. Spec references: - W3C XQuery and XPath Full Text 3.0, Section 3.1 (Full-Text Selections) - W3C XQuery and XPath Full Text 3.0, Section 3.2 (Full-Text Contains) - W3C XQuery and XPath Full Text 3.0, Section 3.3 (Positional Filters) FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…util:track() Phase 1 of query profiling visibility (inspired by BaseX prof: module): util:time($expr) / util:time($expr, $label): Pass-through wrapper that measures and logs execution time. Returns the expression result unchanged. util:memory($expr) / util:memory($expr, $label): Same pattern for memory measurement. Logs the memory delta during expression evaluation. util:track($expr) / util:track($expr, $label): Returns map { "time": xs:dayTimeDuration, "memory": xs:integer, "value": item()* }. Most useful of the three — combines time and memory measurement in a structured result. All registered in UtilModule.java. 13 XQSuite tests in profiling.xql. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extends the memtree DocumentImpl with mutation methods required by XQUF update primitives operating on in-memory (copy-modify-return) nodes. The flat-array architecture of eXist's memtree requires careful index management for insertions, deletions, and replacements. Key additions to DocumentImpl: - insertChildNode/insertChildNodes: insert before/after/into - removeNode: delete with descendant cleanup and array compaction - replaceNode: atomic replace preserving document order - replaceValue: text/attribute/PI/comment value replacement - renameNode: element/attribute/PI rename with namespace handling - replaceElementContent: replace all children with text node - compact(): post-update array defragmentation ElementImpl/NodeImpl changes: - getFirstChildFor(): skip deleted nodes in chain navigation - Namespace propagation helpers for insert operations Updates are processed in reverse document order where needed to avoid flat-array cross-contamination during batch operations. Spec: W3C XQuery Update Facility 3.0, Section 3 (Update Primitives) XQTS: 684/684 non-schema tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement the full text evaluation engine (FTEvaluator) using the sequential AllMatches model defined in W3C XQFT 3.0, Section 4. The evaluator tokenizes string values, applies match options (stemming, wildcards, diacritics sensitivity, case sensitivity, stop words, language), and evaluates the full text selection tree against token streams. FTContainsExpr is the top-level expression node for `contains text` expressions, bridging the XQuery evaluation pipeline to the FT evaluator. FTMatchOptions aggregates all match option settings. FTThesaurus provides synonym expansion via configurable thesaurus URIs, with lazy initialization for runtime efficiency. Spec references: - W3C XQuery and XPath Full Text 3.0, Section 4 (Full-Text Evaluation) - W3C XQuery and XPath Full Text 3.0, Section 4.1 (AllMatches) - W3C XQuery and XPath Full Text 3.0, Section 5 (Match Options) - W3C XQuery and XPath Full Text 3.0, Section 5.6 (Thesaurus Option) - W3C XQuery and XPath Full Text 3.0, Section 5.7 (Stop Word Option) FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@test

Phase 2 of query profiling: util:explain($query) returns the compiled expression tree as XML, showing the post-optimization query plan. FunExplain.java: - Compiles query string using the same pattern as util:compile() - Runs AnalyzeContextInfo for optimizer annotations - Serializes tree via QueryPlanSerializer visitor - Returns root <explain> element QueryPlanSerializer.java: - Extends DefaultExpressionVisitor to walk the expression tree - Emits XML elements for each expression type: <flwor>, <for>, <let>, <where>, <return>, <order-by>, <group-by>, <path>, <step>, <predicate>, <filter>, <comparison>, <function-call>, <builtin-function>, <user-function>, <variable>, <if>, <union>, <intersect>, <try-catch>, <element-constructor>, <text-constructor>, etc. - Includes @axis, @test, @variable, @name, @Arity, @operator, @line, @column attributes where applicable Example: util:explain('for $x in 1 to 10 where $x > 5 return $x') → <explain><for variable="$x"><in>...</in>...</for></explain> 7 new XQSuite tests for explain (83/83 total util tests pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds parser and tree walker rules for all W3C XQuery Update Facility 3.0 expressions: insert, delete, replace (node and value), rename, and copy-modify-return (transform). XQuery.g (lexer/parser): - New tokens: REPLACE, RENAME, COPY, MODIFY, FIRST, LAST, BEFORE, AFTER, INTO, WITH, UPDATING - insertExpr, deleteExpr, replaceExpr, renameExpr, transformExpr - Integration into exprSingle production - Updating function annotations XQueryTree.g (tree walker): - Instantiates XQUF expression classes from AST - Legacy/XQUF syntax conflict detection via markLegacyUpdate/ markXQUFUpdate on XQueryContext - Updating function declaration handling XQueryFunctionAST.java: - isUpdating() flag for function declarations Spec: W3C XQuery Update Facility 3.0, Section 2.1 (Syntax) XQTS: 684/684 non-schema tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extend ForExpr and LetExpr to support optional `score` variable bindings as defined in XQFT 3.0. The score variable captures the relevance score from full-text matching for use in ordering or filtering. Add XQFT-specific error codes (FTST0008, FTST0009, FTDY0016, FTDY0017, FTDY0020) to ErrorCodes.java. Update XQueryContext with thesaurus and stop-word URI map caching to survive context resets, fixing a bug where FT match options were lost during module imports. Fix FTMatchOptions import in XQueryContext to use the correct org.exist.xquery.ft package path. Update StaticXQueryException and XQuery.java for full-text error propagation during static analysis. Spec references: - W3C XQuery and XPath Full Text 3.0, Section 2.3 (Score Variables) - W3C XQuery and XPath Full Text 3.0, Appendix B (Error Conditions) FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…egex fork Upgrade Saxon-HE from 9.9.1-8 to 12.5 and remove the exist-saxon-regex fork (org.exist-db:exist-saxon-regex:9.4.0-9.e1), a copy of Saxon 9.4's internal regex classes that has been maintained separately for over a decade. Saxon 12's public regex API makes the fork unnecessary. Saxon 12 API migration: - FastStringBuffer removed: use FloatValue.floatToString() and DoubleValue.doubleToString() for XPath-compliant formatting - Regex APIs now take UnicodeString: wrap with StringView.of() - XPathException.getErrorCodeLocalPart() replaced by getErrorCodeQName().getLocalPart() - RegexIterator.MatchHandler moved to top-level RegexMatchHandler - Xslt30Transformer.setInitialMode() now throws SaxonApiException - Saxon 12 rejects duplicate document-URIs in the document pool - Saxon 12 rejects null URIResolver and explicit xml namespace declarations in DOM and SAX pipelines - Saxon 12's LinkedTreeBuilder rejects duplicate startDocument events exist-saxon-regex replaced by Saxon 12's JavaRegularExpression API. Full exist-core test suite: 6533 tests, 0 failures, 0 errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add record type support to eXist-db's type system, laying the groundwork for XQ4 record type annotations (record(name as xs:string, age? as xs:integer, *)). Type system: - Type.RECORD = 70, subtype of MAP_ITEM (aligned with parser-next branch) - RecordType class with FieldDeclaration (name, type, optional flag) - Extensible records (trailing "*") support - RecordType.matches() validates map contents against field declarations SequenceType integration: - isRecordType(), getFieldDeclarations(), isRecordExtensible() API - setRecordType()/getRecordType() for parser integration - checkType() delegates to RecordType.matches() for runtime validation DynamicTypeCheck: - Maps pass through record type check (handled by SequenceType.checkType) Tests: 7 unit tests covering type hierarchy, field declarations, optional fields, extensible records, toString output, and SequenceType record API. Parser support (field accessor .name syntax, record test parsing) requires coordination with parser-next branch and will follow in a separate commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 3 of query profiling: util:profile($query) executes a query with profiling enabled and returns a map combining: - result: the actual query result - time: xs:dayTimeDuration of execution - memory: xs:integer bytes delta during execution - plan: element(explain) — the compiled expression tree (from Phase 2) - stats: element() — profiler statistics XML from PerformanceStatsImpl (function calls, index usage, optimizations) FunProfile.java: - Compiles and analyzes query using the same pattern as util:explain() - Enables eXist's built-in Profiler at verbosity=10 before execution - Captures timing via System.nanoTime(), memory via Runtime - Serializes the expression tree via QueryPlanSerializer - Serializes profiler stats via PerformanceStatsImpl.serialize() - Packages everything into a MapType result Two signatures: - util:profile($query as xs:string) as map(*) - util:profile($query, $module-load-path) as map(*) 9 new XQSuite tests (92/92 total util tests pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds fn:put() for persistent document storage via the XQUF Pending Update List, plus the XQueryContext integration layer that connects the XQUF expression classes to the query execution lifecycle. XQUFFnPut.java: - W3C fn:put($node, $uri) implementation - Creates a put update primitive on the PUL - Validates node must be document or element node (FOUP0001) XQueryContext changes: - PendingUpdateList field with get/set accessors - Legacy/XQUF syntax conflict detection (markLegacyUpdate/ markXQUFUpdate) to prevent mixing update syntaxes - PUL reset in context cleanup XQuery.java: - PUL application at query completion boundary FnModule.java: - Register XQUFFnPut in fn: namespace FunInScopePrefixes.java: - Support in-memory nodes for namespace prefix queries Spec: W3C XQuery Update Facility 3.0, Section 2.5.2 (fn:put) XQTS: 684/684 non-schema tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add four test classes covering the W3C XQFT 3.0 implementation: - FTConformanceTest: 622-line conformance suite covering the core XQFT test cases mapped from the W3C Full Text Test Suite (FTTS), verifying spec compliance for contains-text expressions, match options, and positional filters. - FTContainsTest: Integration tests exercising ftcontains expressions end-to-end through the XQuery engine, including edge cases for empty sequences, mixed content, and attribute nodes. - FTEvaluatorTest: Unit tests for the AllMatches evaluator, covering tokenization, match option application, and boolean composition. - FTParserTest: Parser tests verifying that the ANTLR 2 grammar correctly parses all XQFT productions and builds the expected AST. FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… 6.0) Upgrades the embedded Jetty server from 11.0.25 to 12.0.16, migrating from the legacy Jetty 11 APIs to EE10 (Jakarta Servlet 6.0): - Jetty 12 EE10 servlet container replaces Jetty 11 APIs - WebSocket upgrade from Jetty 11 to Jetty 12 WebSocket APIs - Servlet filter/listener registration via EE10 programmatic API - GZip handler migration to Jetty 12 GzipHandler - SSL/TLS configuration updated for Jetty 12 ServerConnector - JMX and statistics handler migration - Thread pool configuration for Jetty 12 - Test infrastructure updated for Jetty 12 test server Dependencies: - jetty-ee10-servlet, jetty-ee10-webapp, jetty-ee10-websocket-* - Remove deprecated jetty-servlet, jetty-webapp modules This is a breaking infrastructure change that affects all HTTP/WebSocket communication. Thoroughly tested with ExistWebServer test harness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Gate FORX0003 (empty-matching regex) on XQuery version in FunReplace and FunTokenize: raise the error in XQ 3.1 mode, allow empty matches in XQ 4.0 mode - Add xquery version "4.0" support to XQueryTree.g tree walker - Declare replace.xqm test module as xquery version "4.0" - Fix unnecessary FQN in FunReplace (PMD) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add 4 new XPath axes from the XQuery 4.0 specification: - following-or-self: self + following (forward axis) - following-sibling-or-self: self + following-sibling (forward axis) - preceding-or-self: self + preceding (reverse axis) - preceding-sibling-or-self: self + preceding-sibling (reverse axis) These convenience axes combine existing axes with self, avoiding verbose union patterns like (self::node() | following-sibling::node()). Implementation evaluates as union of self + base axis results, preserving document order. Changes: - Constants.java: 4 new axis constants (14-17) - XQuery.g: axis names in forward/reverse specifiers + reserved keywords - XQueryTree.g: axis name to constant mapping - LocationStep.java: getOrSelfAxis() dispatch method - Predicate.java: reverse axis detection for new axes 8/8 JUnit tests pass, 148 XPath tests pass (0 regressions). Spec: https://qt4cg.org/specifications/xquery-40/xpath-40-xquery-40.html#axes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…(Phase 3-4) Phase 3 — FieldAccessor expression: - New FieldAccessor class evaluates $expr.fieldName as map:get($expr, "fieldName") - Returns empty sequence for missing fields, XPTY0004 for non-map base - Parser-next branch will wire the .NCName postfix syntax to this class Phase 4 — Record type checking in function arguments: - RecordTypeCheck wraps function arguments declared with record types - Validates at runtime: argument is a map, has all required fields, field types match, no extra keys (unless extensible) - Wired into Function.checkArgumentType() (both typeMatches and non-match paths) alongside existing FunctionTypeCheck pattern - Also wired into UserDefinedFunction.eval() for runtime parameter validation - Descriptive error messages: "missing required field 'age'" etc. 10 new tests (4 FieldAccessor + 6 RecordTypeCheck), all 17 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add the array/map filter expression from the QT4 specification. The ?[expr] operator filters array members or map entries by evaluating a predicate with each member/value as the context item. Examples: [1, 2, 3, 4, 5]?[. > 3] -> [4, 5] map{"a":1, "b":2, "c":3}?[. > 1] -> map{"b":2, "c":3} Parser (XQuery.g): - Add FILTER_AM token - Add filterExprAM rule: QUESTION LPPAREN exprSingle RPPAREN - Disambiguate from lookup: ?[ is FilterExprAM, ?name is lookup Tree Walker (XQueryTree.g): - Add filterExprAM rule that creates FilterExprAM expression - Chain in postfixExpr alongside lookup and predicate FilterExprAM.java: - Extends AbstractExpression - For arrays: iterates members, evaluates predicate with each as context item, keeps members where EBV is true - For maps: iterates entries, evaluates predicate with each value as context item, keeps entries where EBV is true - XPTY0004 for non-array/map targets 17 XQSuite tests (993/993 total XQuery3Tests pass, 0 regressions). Spec: https://qt4cg.org/specifications/xquery-40/xpath-40-xquery-40.html#id-filter-am Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 3.2: Per-query scoping for PerformanceStats - Add Profiler.getPerformanceStats() to expose per-profiler stats - FunProfile now reads from the per-query profiler's stats instead of the global BrokerPool stats, ensuring correct data under concurrent load Phase 4.1: Optimizer decision logging - Add DEBUG-level logging in GeneralComparison.analyze() showing when index optimization is applied or skipped, including the expression text, QName, and optimization type Phase 5.1: util:index-report($query) as element() - Execute a query with profiling enabled and return an XML report of index usage and optimizations from the per-query PerformanceStats - Uses the same per-query profiler isolation as util:profile() 2 new XQSuite tests (94/94 total pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds comprehensive JUnit test coverage for all W3C XQuery Update Facility 3.0 operations, plus a performance benchmark for measuring update primitive throughput. XQUFBasicTest.java (73 tests): - Insert node (before/after/into/as first/as last) - Delete node (element, attribute, text, comment, PI) - Replace node and replace value of node - Rename node (element, attribute, PI) - Copy-modify-return (transform) expressions - Namespace conflict detection (XUDY0021/0023/0024) - XUST0001/XUST0002 static type errors - Complex multi-step update scenarios XQUFBenchmark.java: - Performance benchmarks for insert/delete/replace throughput - Deep tree copy-modify-return scaling tests bindingConflictXQUF.xqm: - XQUF-specific namespace binding conflict tests using copy-modify-return syntax (separated from legacy tests because the two syntaxes cannot be mixed in one module) XQTS: 684/684 non-schema tests pass (W3C XQuery Update Facility) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ack, number-parser Implements the full options map for fn:json-to-xml per XPath Functions spec section 22.3.2 and XQuery 4.0 extensions: - escape option: when true, preserves JSON escape sequences (\n, \t, \r, \\, etc.) as literal text in XML output and adds escaped="true" and escaped-key="true" attributes. When false (default), replaces non-XML characters with U+FFFD. - duplicates option: default is "retain" for json-to-xml (keeps all entries). Supports "use-first" (skips subsequent duplicates) and "reject" (throws FOJS0003). Rejects "use-last" with FOJS0005 per spec. - fallback option: calls user function with \uXXXX form for characters not representable in XML 1.0. Validates return type (XPTY0004 for empty/sequence, FOTY0013 for function items). - number-parser option (XQ4): calls custom function for number text, uses result's string value as <number> element content. Also fixes: - Options type validation: XPTY0004 for wrong type/cardinality on all boolean, string, and function options - FOJS0005 for escape=true combined with fallback - Empty string input now raises FOJS0001 (was silently returning empty) - BOM (U+FEFF) stripped from input - json-to-xml 2-arity options parameter now accepts empty sequence for XQuery 4.0 compatibility Refactors jsonToXml from flat iterative to recursive structure for proper per-scope duplicate key tracking and nested value skipping. Preserves backward-compatible legacy overload. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Per XQuery 4.0, when a map is passed where a record type is expected, it should be coerced rather than rejected. This commit implements: 1. SequenceType.checkType(int): Accept MAP_ITEM when expected type is RECORD, since maps are the runtime representation of records. This fixes function return type checks that rejected map(*) for record(*) return types. 2. SequenceType.isRecordType(): Return true when primaryType == RECORD without requiring a non-null RecordType object. This makes bare record() (no fields) a valid record type that matches empty maps. 3. SequenceType.checkType(Item): Handle bare record() — empty maps match, non-empty maps don't. 4. RecordTypeCheck.eval(): Transform maps during coercion instead of just validating. The coercion creates a new map containing only declared fields (dropping undeclared extras), coerces field values to declared types via atomic type conversion, and validates that required fields are present. Fixes prod-RecordType tests: 002, 003 (map-to-record coercion), 105 (empty record matching), 015 (type conversion), 016-018 (drop excess fields). Test 011 (key ordering) requires ordered map support which is deferred. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add DATETIME_RECORD type constant (71) as a subtype of MAP_ITEM, registered with the name "fn:dateTime-record". This allows the XQTS runner's assert-type parser to resolve fn:dateTime-record type assertions instead of crashing with "Type: fn:dateTime-record is not defined". On branches with RECORD type support (next-v3, v2/xq4-record-types), the parent should be changed from MAP_ITEM to RECORD during cherry-pick. Note: FnDateTimeRecord.eval() also needs to return values with type DATETIME_RECORD (rather than plain MAP_ITEM) for the assert-type checks to fully pass. That change belongs with FnDateTimeRecord on v2/xq4-core-functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tomic, and placeholders Three fixes for XQTS prod-Lookup and prod-UnaryLookup failures: 1. Lookup.java: Move type check from sequence level to per-item iteration. The previous check used leftSeq.getItemType() which returns the common supertype — for a mixed array+map sequence this is ITEM, incorrectly failing the check. Per spec, each item is validated individually. Fixes Lookup-227 ([[1,2,3], map:entry(3,5)]?*?3). 2. ArrayType.get(): Add untypedAtomic-to-integer cast for array subscripts. Per XQ 3.1 spec, when an element node is atomized and used as an array lookup key, the resulting xs:untypedAtomic must be cast to xs:integer. Fixes Lookup-021, Lookup-121, Lookup-230, UnaryLookup-021. 3. FunctionFactory.createFunction(): Skip the string function optimization (contains/starts-with/ends-with/equals rewrite to GeneralComparison) when any argument is a Function$Placeholder from partial function application. The optimization casts all params to PathExpr, which throws ClassCastException for placeholders. Falls through to the normal functionCall() path instead. Fixes Lookup-016, Lookup-017, UnaryLookup-016, UnaryLookup-017. Total: recovers 9 of 27 non-passing tests (3 skipped for schemaValidation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

XQuery 4.0 changes MapConstructorEntry to make the ": ExprSingle" part optional. When the colon is absent, the expression must evaluate to a map at runtime, whose entries are merged into the result. This enables conditional and computed map entries: map { "a": 1, map {"b": 2} } map { "a": 1, if ($cond) then map {"b": 2} else map {} } Parser: added MAP_MERGE imaginary token and syntactic predicate in mapAssignment to differentiate key:value from merge entries. Tree walker: added MAP_MERGE alternative in mapConstr rule. MapExpr: refactored to sealed Entry interface with KeyValueEntry and MergeEntry records. Merge entries are type-checked (XPTY0004) and duplicate-key checked (XQDY0137) at evaluation time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…okup keys Three parser/grammar quick fixes: A. Restore fn:lang function name - the keyword argument session accidentally renamed the QName from "lang" to "language", making fn:lang() unresolvable (44 XQTS failures). B. Reject extensible records per XQ4 PR2413 - record(*) and record(field, *) now raise XPST0003. Removed anyRecordTypeTest grammar rule and STAR alternatives from record type parsing in both ANTLR grammar and tree walker (6 XQTS tests). D. Version-gate decimal/double literal lookup keys - $map?1.5 is XQ4-only syntax; now raises XPST0003 in XQ 3.1 mode via xq4Enabled semantic predicates (5+ XQTS tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Updates ~75 FunctionParameterSequenceType attribute names across 64 fn: function files to match the XQuery 4.0 Functions and Operators specification. This enables keyword argument resolution (name := value syntax) to match parameter names correctly. Key patterns: - $arg -> $value (transformation functions: abs, ceiling, floor, round, etc.) - $arg -> $input (sequence functions: reverse, head, tail, count, etc.) - $arg -> $node (node functions: local-name, name, namespace-uri, root, etc.) - $arg -> $values (aggregate functions: sum, min, max, string-join) - $collation-uri -> $collation (all collation parameters) - $source-string -> $value (contains, starts-with, ends-with) - $string-1/$string-2 -> $value1/$value2 (compare, codepoint-equal) - $sequence -> $input, $function -> $action/$predicate (HOF functions) - $date/$time/$date-time/$duration -> $value (date/time extraction functions) - Various other renames per XQ4 F&O spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…t XQ4 lookaround Add pre-validation of regex patterns in fn:matches and fn:replace to reject constructs that are not part of the XPath regular expression specification (F&O 3.1, Section 5.6.1) but that Saxon's XP30 mode silently accepts. Rejected constructs include: - \x, \u hex/unicode escapes (not in XPath regex) - \A, \Z, \z Java-specific anchors - \b, \B word boundary assertions - \a, \e, \f, \v special character escapes - \Q, \E literal quoting - \G, \k, \g named/numbered back-references - (?=...) (?!...) (?<=...) (?<!...) Java-style lookaround - (?>...) atomic groups - (?i:...) (?m:...) (?s:...) (?-i:...) inline flag groups - *+ ++ ?+ possessive quantifiers Also adds support for XPath 4.0 named lookaround syntax by translating (*positive_lookahead:...) etc. to Java regex (?=...) equivalents. Expected XQTS impact: ~137 of 173 fn-matches.re failures fixed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix multiple error code and validation issues in type casting: - CastExpression/CastableExpression: Move xs:anySimpleType from XPST0080 to XPST0051 check; add xs:anyType to XPST0051. Per spec, XPST0080 is only for xs:NOTATION and xs:anyAtomicType. - CastExpression: Add string-intermediary casting path for string-derived types (xs:language, xs:Name, xs:NMTOKEN, etc.) so non-string source types cast via xs:string first, producing correct FORG0001 on invalid values. - CastExpression: Allow xs:untypedAtomic to cast to xs:QName. - DynamicTypeCheck: Raise XPTY0117 (new) when xs:untypedAtomic is coerced to namespace-sensitive types (xs:QName, xs:NOTATION) in function arguments. - ErrorCodes: Add XPTY0117 for namespace-sensitive type coercion errors. - AbstractDateTimeValue: Reject seconds >= 60, hour 24 with fractional seconds, and leading zeros in year (e.g., "02004" is invalid). - Date/time value constructors (TimeValue, DateTimeValue, DateValue, GYearValue, GMonthValue, GDayValue, GYearMonthValue, GMonthDayValue): Use ErrorCodes.FORG0001 instead of bare exerr:ERROR. - DurationValue/DayTimeDurationValue: Validate decimal point has digits on both sides (reject "PT.5S" and "PT30.S"). - DoubleValue: Use BigDecimal for double-to-integer conversion to prevent silent long overflow (fixes xs:integer(99e100)). - AnyURIValue: Relax URI validation to accept valid URI references that java.net.URI rejects (e.g., ":/", "%gg", "%GF"). - StringValue: Use XMLNames.isName() for xs:Name validation instead of QName.isQName(), so colons are accepted (e.g., "::::"). - BinaryValue: Use ErrorCodes.FORG0006 for effectiveBooleanValue() instead of bare error string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r keyword arguments XQuery 4.0 keyword argument resolution matches parameter names from the function signature. Many built-in function parameters used legacy names (e.g., fn:floor($number) instead of fn:floor($value)) that don't match the XQ4 spec, causing keyword argument calls like fn:floor(value := ?) to fail with arity mismatch errors. Updated parameter names in 33 files across fn:, math:, and array: modules to match the XQ4 spec. This enables keyword argument syntax for ~56 of the 65 failing misc-BuiltInKeywords XQTS tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lation construction Add fn:collation($options as map(*)) overload that constructs a UCA collation URI from a map of collation options as defined in the XQ4 spec. The function: - Accepts UCA option keys (strength, lang, fallback, case-first, etc.) - Converts hyphenated map keys to camelCase URI parameters - Converts boolean values to "yes"/"no" - Validates option values when fallback=false, raising FOCH0002 for errors - Returns a UCA collation URI usable with fn:compare, fn:sort, etc. The existing fn:collation() 0-arg and fn:collation(string) 1-arg variants are preserved. The 1-arg signature now accepts item() to dispatch between string (collation URI check) and map (UCA construction) at runtime. Expected to fix ~71 of 73 failing fn-collation XQTS tests (all error-only and error-or-value tests pass; value-only tests that require actual UCA comparison behavior depend on the collation infrastructure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ement-to-map, regex, deep-equal - fn:codepoints-to-string: Accept C0 control characters (U+0001-U+001F) per XML 1.1 / XQ 4.0 while still rejecting Unicode noncharacters. Fixes 87+ fn-graphemes tests that use codepoint 1. - fn:build-dateTime: Add 1-argument map-based overload (XQ 4.0) that constructs xs:dateTime, xs:date, xs:time, or Gregorian types based on which record fields are present. Includes FODT0005/FODT0006 validation. (+45 tests) - fn:element-to-map: Fix list layout to group same-named siblings under shared key, add grouped content for mixed-name siblings, preserve whitespace-only text nodes, use xml: prefix for XML namespace attributes. (+38 tests) - Regex \b/\B word boundaries: Pre-process XQ 4.0 word boundary escapes before Saxon translation (which only supports XP 3.0 regex). Falls back to Java regex for patterns containing \b/\B. (+8 tests) - fn:deep-equal: Add "unordered" option key support (inverse of "ordered"). - ErrorCodes: Add FODT0005 (missing date/time component) and FODT0006 (invalid date/time component value). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…matching, -or-self axes JNode.java: Add getFollowing() and getPreceding() methods implementing the full following/preceding axes (siblings + descendants, recursive). LocationStep: - Add following, preceding, following-or-self, preceding-or-self, following-sibling-or-self, preceding-sibling-or-self axis handling for JNodes (all 15 XPath axes now supported). - Fix matchesJNode() wildcard: TypeTest(ELEMENT) from child::* was not matching JNodes. Now treats ELEMENT as wildcard in JNode context. - Fix NameTest handling: support named key lookup (e.g., $root/name) and proper wildcard detection for JNodes. JNodeTest: Add xpathChildWildcard test for child::* on JNodes. +6 AxisStep.J tests (JAxes-001, JAxes-021 histogram tests now pass), zero regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…jnode() parsing Fix three issues preventing JNode path expression and axis step tests from passing: 1. Static type check for JSON nodes: Step.returnsType() returns NODE for all non-self axes, causing function parameter checks to fail when json-node() type is required. Relax Function.checkArgumentType() to allow NODE→JSON_NODE in XQ4 (defer to dynamic check). Add JNode recognition in DynamicTypeCheck for runtime validation. This unblocks all 30 histogram tests in prod-AxisStep.J. 2. fn:get() sequence support: Change cardinality from EXACTLY_ONE to ZERO_OR_MORE and iterate over key sequences. Tests call get((1,3)), get(()), etc. which requires sequence arguments. 3. jnode(name, type) parsing: Replace crude ~(RPAREN)* grammar rule with paren-balanced skipper (jnodeTestArgs) that handles nested parens in expressions like jnode(*, record(A as xs:double, B as xs:string)). 4. Add JSON_NODE to Type.isNavigable() so JNode types are recognized as navigable in path expressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ility, lazy module registration Fix several XQuery module import conformance issues detected by the XQTS prod-ModuleImport test set, improving pass rate from 55/117 to 101/131. Annotation validation: - Add XQST0106 error for duplicate/conflicting %public/%private annotations on function declarations - Add XQST0116 error for duplicate/conflicting %public/%private annotations on variable declarations - Handle both the XPath Functions namespace and the XQuery annotation namespace (http://www.w3.org/2012/xquery) for visibility annotations Private visibility enforcement: - Store %private flag on VariableDeclaration - Check %private visibility in XQueryContext.resolveVariable() to prevent access to private variables from outside their declaring module Module resolution: - Add addModuleLocationHint() to XQueryContext for lazy module location registration without eager loading - Override getModuleLocation() in ModuleContext to propagate dynamic module locations from parent contexts - Normalize whitespace in namespace URIs in importModule() - Add recursion guard in ExternalModuleImpl.resolveVariable() to prevent stack overflow from circular variable dependencies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nformance Fixes 59 XQTS failures (0% → 100% pass rate for fn-element-to-map suite): - Implement "default" name-format (spec default, not "eqname"): child elements in same namespace as parent use local name, different namespace uses Q{ns}local, no-namespace child of namespaced parent uses Q{}local. Special case for xml: namespace attributes. - Fix layout classification: whitespace-only text between elements is ignored for layout detection but preserved as content when the element has only whitespace text and no child elements. - Implement sequence layout for non-unique child names (was incorrectly using record layout which collapsed duplicates). - Fix list/list-plus layout: list drops child element name and returns array directly; list-plus uses child name as map key. - Add plan support: explicit layout directives (empty, empty-plus, simple, simple-plus, list, list-plus, record, sequence, mixed, xml, error, deep-skip), fallback via "*" key, type coercion (numeric/boolean), FOJS0008 error for layout mismatches. - Add XML serialization layout for plan-based conversion. - Add option validation: name-format, attribute-marker type checks with XPTY0004 errors for invalid values. - Add xsi:type coercion for schema-typed simple content. - Fix fn:element-to-map-plan corpus analysis: properly merge multiple instances, detect list patterns across empty and non-empty instances, generate type annotations for numeric content. - Add FOJS0008 error code to ErrorCodes.java. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…checking fn:unparsed-text improvements: - Change $encoding parameter from xs:string to xs:string? to accept empty sequence - Fix error code mapping: FODC0005 → FOUT1170 for URI syntax errors - Add URI-only dynamic text resource lookup for encoding-agnostic resolution (fixes UTF-16 and ISO-8859-1 resources when no encoding is specified) - Add readLines support for dynamic text resources (was missing) - Add XML character validation (FOUT1190) for non-XML characters - Fix unparsed-text-available to return false (not empty sequence) for empty href Function type checking (SequenceType): - Add functionParamTypes and functionReturnType fields to SequenceType - Wire up ANTLR tree walker to populate function type info (resolves TODO) - Add return type covariance checking for function instance-of operations XQTS fn-unparsed-text: 50 → 32 failures (18 tests fixed, 36% improvement) Subtyping fixes require next-v3 integration branch for proper testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, promotions - Add XPST0003 checks for reserved function names in NamedFunctionReference and FunctionFactory, fixing ~16 prod-NamedFunctionRef and ~7 prod-FunctionCall tests that incorrectly returned XPST0017 or succeeded when they should fail - Fix context item passing for wrapped internal functions (FunctionFactory.wrap). UserDefinedFunction now preserves the evaluation context for wrapper functions, fixing ~15 tests where context-dependent functions like fn:string#0, fn:node-name#0, fn:id#1, fn:idref#1 lost the focus when called via function references - Add binary type promotion (xs:base64Binary ↔ xs:hexBinary) in GeneralComparison and DynamicTypeCheck per XQuery 4.0 spec, fixing 4 function-call-promotion tests - Register 2-arity fn:element-with-id signature (the implementation already handled 2 args but the signature was missing), fixing 2 tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…codes Two XQTS compliance fixes: 1. OrderedDurationValue: comparing xs:yearMonthDuration with xs:dayTimeDuration now correctly throws XPTY0004 for all operators (eq/ne/lt/gt/le/ge), not just ordering operators. Per XPath spec, these duration subtypes are not comparable. (~32 test failures) 2. FunSerialize: preserve specific serialization error codes (SERE0020, SERE0022, SERE0023, SEPM0017) from exception messages instead of collapsing them all to SENR0001. The JSON serializer already throws the correct codes, but FunSerialize was catching and re-wrapping them. (~18 test failures) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…quenceType methods, FnFormatDates params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds a per-expression optimization hook modelled on BaseX's Expr.optimize(): each Expression subclass may return a replacement expression (constant fold, branch select, etc.) instead of just being wrapped in a pragma by the central Optimizer visitor. Why: eXist's optimizer is a single visitor pass that can only wrap predicates in pragmas — it cannot replace expressions, cannot compose optimizations, and must understand every expression type centrally. This is recommendation L1 from query-optimizer-audit.md and a precursor to FLWOR loop-invariant hoisting and other per-class optimizations. Design (joe-vault/Claude/exist/optimizer-expression-optimize-design.md): - Expression.optimize(CompileContext) is a default method on the interface, returning `this`. Zero source impact on the 370+ existing Expression subclasses (38 direct AbstractExpression, 213 functions, 122 extension functions, plus DebuggableExpression). - CompileContext carries the XQueryContext, an in-memory rewrite log, and the changed flag. replaceWith(orig, new, reason) is the preferred return path. preEval(expr) folds a no-dependency expression to a literal. - XQueryContext.analyzeAndOptimizeIfModulesChanged runs the new pass before the legacy visitor: replacements happen first, the visitor then operates on the reduced tree. The CompileContext is stored on the context (getLastCompileContext()) so diagnostics and tests can see what fired. The legacy visitor stays in place for the migration period. - Recursion enablers: PathExpr.optimize walks `steps`, BindingExpression.optimize walks inputSequence + super(returnExpr), AbstractFLWORClause.optimize walks returnExpr. These let optimize() reach into the most common containers — without them, only the top-level expression's optimize() would fire. This commit is framework-only; no Expression subclass yet returns a non-trivial replacement. Behaviour is unchanged from the baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…LetExpr Demonstrates the per-expression optimize() framework with three expression types chosen for breadth of cases: - ConditionalExpression: constant-condition folding. When testExpr reduces to a LiteralValue with a statically-known EBV, return the chosen branch directly (e.g. `if (1=1) then "a" else "b"` becomes `"a"`). Recurses into testExpr/thenExpr/elseExpr. - GeneralComparison: pre-evaluates literal-vs-literal comparisons via cc.preEval (e.g. `1 = 1` becomes `true`). Calls super.optimize() to recurse into the left/right operands stored in PathExpr.steps. - LetExpr: drops the let-binding when the body is a literal value (so the variable is by definition unused) and the input is also a LiteralValue (no side effects to preserve). Conservatively guards against score bindings (XQFT 3.0 §2.3), clause chains (would require previousClause repair), and FLWOR-clause return bodies. Tests (ExpressionOptimizeTest) parse a query into the internal expression tree, run analyzeAndOptimizeIfModulesChanged, and assert both the structural rewrite and the CompileContext log entry. Follows the pattern from CountExpressionTest — no embedded server needed. XQuery3Tests on this branch shows 7 failures + 1 error, identical to the baseline on next-v3 (verified by stashing changes and rebuilding): the failures are pre-existing and unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a per-ForExpr.optimize() pass that detects when an inner for-clause's input expression is loop-invariant relative to enclosing FLWOR loops and rewrites it as a reference to a synthesised let inserted upstream. The expression then evaluates once instead of once per outer iteration. Approach: rewrite-into-let, not eval-time caching. A previous attempt at caching (BindingExpression flag populated on first eval) failed with 63 XQuery3Tests regressions due to per-Expression-instance cache lifetime bugs (function bodies / closures / parameters / unsafe visitor recursion). The rewrite-into-let model sidesteps all five pitfalls because the hoisted let is just a plain FLWOR clause evaluated by normal semantics. CompileContext additions: - enterFlworChain / exitFlworChain push and pop scope state on a Deque; - addVisibleFlworVar routes vars into either letPrefixVars (safe to reference from a hoist) or loopBodyVars (unsafe — block hoisting), partitioned by whether the chain has yet seen a for-clause; - recordForClause marks the first FOR per scope as the hoist insertion point; - addPendingHoistToOutermost queues the hoist on the outermost scope that actually has a FOR (a let-only outer chain has no loop to lift over and nothing to insert before); - applyHoistsAndExitChain wraps the chain head in synthesised lets when the FOR is the head, otherwise splices them mid-chain between the predecessor clause and the FOR. ForExpr.optimize: - pushes scope when chain head, recurses input first (so inner FLWORs in the input register hoists targeting US), runs the hoist check on our input, records this as the firstForClause, then makes our binding visible and recurses the rest of the chain. The inner walker (RefCollector) handles getSubExpression-gap classes explicitly (BindingExpression / FilteredExpression / LocationStep / WhereClause) and aborts conservatively on unrecognised shapes. LetExpr.optimize: - mirrors the chain-head detection so let-headed chains also track scope; - registers its var visibility AFTER the input is recursed (per XQuery scoping: the let binding is not in scope for its own initializer). AbstractFLWORClause.optimize: - registers each clause's tuple-stream variables on the active scope so group-by keys, count vars, etc. are tracked alongside for/let vars (otherwise an inner hoist that referenced one would falsely classify loop-invariant). ElementConstructor.optimize: - recurses into the content PathExpr and the qname expression. Without this the optimize() pass dies at every wrapping element (every XMark query wraps results in <XMark-result-Q*>{ ... }</XMark-result-Q*>). DebuggableExpression.optimize: - recurses into the wrapped expression and captures any replacement. The FLWOR parser inserts a DebuggableExpression around every return body; without this, optimize is silently disabled for FLWOR returns. XMark heavy-tier (Q8/Q9/Q11/Q12), QT4 app-XMark, factor 0.01: L1 framework alone: 9.375s L1 + this hoist: 6.532s (-30%) Per-query (this run): Q8: 2.099 -> 1.566s Q9: 2.131 -> 1.645s Q11: 2.579 -> 1.646s Q12: 2.566 -> 1.675s Verification: - xquery.xquery3.XQuery3Tests: 1227 tests, 7 failures + 1 error — exact match with branch baseline; zero new regressions. - OptimizerTest: 6/6 pass. - ExpressionOptimizeTest: 12/12 pass (4 new hoisting tests added). - XMark app-XMark: 21/21 pass (functional correctness preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Detects the {for $i in <invariant> where $i/key = $outer/key return ...} pattern in ForExpr.optimize() and rewrites it into a HashJoinForExpr that builds a hash on the (already-hoisted) input once per query and probes per outer iteration, replacing the existing nested-scan WhereClause.preEval fast-track with O(N+M) build/probe. Cache lifetime is keyed by the input Sequence reference, mirroring BaseX CmpHashG/CmpCache. A fresh function invocation produces a new input Sequence (fresh let-bindings) and rebuilds the hash, sidestepping the cross-call cache lifetime bug that broke the earlier eval-time-cache attempt (per joe-vault/Claude/exist/query-optimizer-overhaul.md). Detection in ForExpr.tryHashJoinRewrite gates conservatively: - relation must be Comparison.EQ (no truncation/collation) - exactly one side references the inner FOR's variable; the probe side must be statically free of inner-var references (verified by RefCheck walker that mirrors RefCollector's class-shape coverage) - WhereClause.returnExpr must NOT be a FLWORClause (no order-by / group-by / chained for-let — match-iteration would not preserve their semantics) - no positional/score variables, no `allowing empty` - DebuggableExpression / single-step PathExpr wrappers are peeled before the comparison check (the parser emits these for debug fidelity) HashJoinForExpr extends ForExpr (preserves visitor identity, scope management, and previousClause wiring) and overrides eval() with the hash-build-and-probe path. resetState clears the cache. Optional gating via -Dexist.optimizer.hashjoin=false (or ForExpr.setHashJoinEnabledForTest for unit tests). Default is true; the flag exists as an emergency switch if a workload regresses. XMark factor 0.01, app-XMark, ANTLR parser, 4 interleaved runs ON vs OFF: Heavy tier (Q8/Q9/Q11/Q12) median: 4.47s ON / 6.42s OFF (~30% red.) Q8 specifically: 0.34s ON / 1.57s OFF (~4.6x) Q9: 1.19s ON / 1.68s OFF Q11: 1.43s ON / 1.66s OFF Q12: 1.48s ON / 1.66s OFF app-XMark (21 tests) total: 8.6s default-on / 21/21 pass Verification: - xquery.xquery3.XQuery3Tests: 1227 tests, 7 failures + 1 error — exact match with branch baseline (50b512b), zero new regressions. - OptimizerTest: 6/6 pass. - XPathQueryTest: 150/150 pass (4 skipped, pre-existing). - ExpressionOptimizeTest: 18/18 pass (6 new hash-join structural tests: fires for `=`, skipped for `<`, skipped when no inner-var ref, skipped when both sides reference inner var, fires at top level, skipped when body is a FLWOR clause). - XMark app-XMark: 21/21 pass (functional correctness preserved). Note: 2 RD-parser-only failures (XMark-Q4, XMark-All) on `$pr1 << $pr2` inside a quantified expression are pre-existing and unrelated to this change — same failures appear with hash-join off and with the parent 50b512b commit alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A query of the form let $v := <persistent path> return $v[Optimizable-pred] ran ~167x slower than the equivalent direct form. The legacy Optimizer wraps the FilteredExpression in the (#exist:optimize#) pragma in both cases, but at runtime BasicExpressionVisitor.findFirstStep cannot find a top-level LocationStep when the FE source is a VariableReference, so Optimize.eval records contextStep == null and falls back to innerExpr.eval(result, ...) -- which invokes FilteredExpression.eval, which calls VariableReference.eval, which reads the bound value off the local variable stack and ignores contextSequence. The pre-selected NodeSet is therefore computed and then thrown away, and the predicate runs once per node in the full input. This commit adds LetInliner.tryInline(let, cc) called from LetExpr.optimize(cc). When all six soundness gates hold: 1. variable name present and not a score binding (XQFT 3.0 §2.3); 2. standalone let, not a chain link; 3. no declared sequence type; 4. node-typed input; 5. input contains a non-wildcard LocationStep; 6. body unwraps to a FilteredExpression whose source is the variable, with exactly one Optimizable predicate, and the variable does not appear anywhere else in the body, the predicate is appended to the input's last LocationStep and the LetExpr is replaced by the input path. The legacy Optimizer pass then sees the same shape it knows how to wrap from the direct form, attaches the pragma to the LocationStep, and routes through the index pre-select. Gate 6 mirrors what was noted in a 2016 comment on the issue thread: inlining is only desirable when the inlined form would expose an Optimizable to visitLocationStep. Closes eXist-db#873 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six tests covering the LetInliner rewrite: - issue873_indirectQueryReturnsSameNodes -- direct vs indirect must return the same nodes (correctness check). - issue873_inlineRewriteLogged -- the optimizer rewrite log must contain an "inline let $a" entry for the indirect query. - inline_doesNotFireWhen_letReferencedTwice -- gate (count != 1). - inline_doesNotFireWhen_letBoundToCount -- gate (body is not a FilteredExpression). - inline_doesNotFireWhen_letIsTyped -- gate (sequenceType present). - issue873_indirectQueryUnderLoosePerfBound -- a 20x ceiling + 500ms slack, loose enough to avoid CI flakiness while still catching the original 167x regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joewiz · 2026-05-07T01:59:56Z

Closing -- the base branch (optimizer/expression-optimize-method) does not exist on eXist-db/exist, so the diff incorrectly includes the 256 framework commits that this fix is stacked on. Will reopen once the foundation branch is upstream, or open a within-fork PR for review in the meantime.

joewiz · 2026-05-07T02:03:29Z

Work is now in joewiz#13

joewiz and others added 30 commits April 13, 2026 09:25

joewiz and others added 26 commits April 26, 2026 02:07

[bugfix] Fix merge artifacts: BooleanValue switch, FnElementToMap, Se…

44a5796

…quenceType methods, FnFormatDates params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

joewiz requested a review from a team as a code owner May 7, 2026 01:55

joewiz closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature] Inline let-bound persistent paths for index pre-select#6306

[feature] Inline let-bound persistent paths for index pre-select#6306
joewiz wants to merge 258 commits into
eXist-db:developfrom
joewiz:fix/873-let-inlining-for-index-preselect

joewiz commented May 7, 2026

Uh oh!

joewiz commented May 7, 2026

Uh oh!

joewiz commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

joewiz commented May 7, 2026

Summary

Diagnosis

What changed

The six gates (LetInliner.tryInline)

Why option (a) inline at AST level, vs option (b) special-case Optimize.eval

Test plan

Uh oh!

joewiz commented May 7, 2026

Uh oh!

joewiz commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

The six gates (`LetInliner.tryInline`)

Why option (a) inline at AST level, vs option (b) special-case `Optimize.eval`