Implement XQuery 4.0 parser with version gating and feature flag#6216
Implement XQuery 4.0 parser with version gating and feature flag#6216joewiz wants to merge 20 commits into
Conversation
72216ba to
2a6fd06
Compare
Adds XQ4 syntax support to the ANTLR 2 parser and tree walker:
- Pipeline operator (->): chainable expression transformation
- Focus functions (fn { }): anonymous functions with implicit context
- Keyword arguments: named parameter passing at call sites
- String templates (`` `{expr}` ``): interpolated string literals
- Otherwise operator: fallback for empty sequences
- Braced if: if (cond) { expr } without else clause
- Try/finally: cleanup expressions that always execute
- For member: iterate over array members in FLWOR
- While clause: conditional FLWOR iteration
- Default parameter values in function declarations
- Mapping arrow (=>) and method call (=?>)
- Ternary conditional (if..then..else as expression)
- QName literals (#name): symbolic name references
- Hex/binary integer literals (0xNN, 0bNN)
- Numeric underscore separators (1_000_000)
- Choice/enum cast types
- Version gating: XQuery 4.0 features require version declaration
Grammar sections added in labeled blocks per feature area within
the XQuery 4.0 Parser Extensions section.
Spec: QT4 XQuery 4.0 §3 (Expressions), §4 (Modules and Prologs)
XQTS: QT4 parser-dependent test sets (1898/2163, 87.7%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New expression classes that implement the evaluation logic for XQ4 syntax features parsed by the grammar: - PipelineExpression: evaluate left expr, bind to context for right - FocusFunction: anonymous function with implicit context item (.) - KeywordArgumentExpression: wraps named args for function dispatch - MappingArrowOperator: => operator (function application) - MethodCallOperator: =?> operator (method-style dispatch) - OtherwiseExpression: return left if non-empty, else right - FilterExprAM: ?[predicate] — array/map member filter - ForMemberExpr: for member $x in $array — iterate array members - ForKeyValueExpr: for key/value pair iteration - WhileClause: while (cond) in FLWOR — conditional iteration - LetDestructureExpr: let destructuring bindings - StringConstructor: XQ4 string template interpolation - ChoiceCastExpression/ChoiceCastableExpression: union/choice type casts - EnumCastExpression: enumeration type casts Each class extends Expression/AbstractExpression and implements eval() with proper context handling and dependency tracking. Spec: QT4 XQuery 4.0 §3 (Expressions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ions Extends the existing type system and expression classes for XQ4: Type system: - Type.CHOICE: union/choice type constant - Type.ENUM: enumeration type constant - SequenceType: XQ4 record type support, named function parameters - FunctionParameterSequenceType: default value support for params - Constants: XQ4 axis constants (ancestor-or-self::*, etc.) Expression modifications: - Function: support default parameter values (XQ4 §4.15) - FunctionFactory: keyword argument dispatch - UserDefinedFunction: default parameter evaluation - ForExpr/LetExpr: for-member and while clause integration - FLWORClause: while clause chaining - TryCatchExpression: finally clause support - SwitchExpression: XQ4 fall-through semantics - StringConstructor: XQ4 string template evaluation - XQueryContext: XQ4 version detection, xquery version "4.0" - LocationStep: combined axis support Spec: QT4 XQuery 4.0 §2.5 (Types), §4 (Modules) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standardize error codes across casting and type checking:
- Use XPTY0004 consistently for type errors (was mixed with FORG0001)
- Use FORG0001 for invalid cast values (not type mismatches)
- Add XPST0080 for xs:anyType in cast/castable (XQ4 spec)
- Add XQ4-specific error codes for new expression types
- Fix DynamicCardinalityCheck, DynamicTypeCheck, TreatAsExpression
to use correct W3C error codes
- Align all value type convertTo() methods with spec error codes
This fixes ~30 XQTS test failures caused by wrong error codes.
Spec: W3C XQuery 3.1 §B.1 (Error Codes),
QT4 XQuery 4.0 Appendix B (Error Codes)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive XQSuite test module for XQ4 syntax features:
- Pipeline operator: basic chaining, nested pipelines, with functions
- Focus functions: fn { . + 1 }, context item binding
- Keyword arguments: named parameter passing, mixed positional/named
- String templates: interpolation, nested expressions, escaping
- Otherwise operator: empty fallback, non-empty passthrough
- Braced if: if (cond) { expr } without else
- Try/finally: cleanup execution, error propagation
- For member: array member iteration
- While clause: conditional FLWOR iteration
- Default parameter values: function declarations with defaults
- QName literals: #name symbolic references
- Hex/binary integer literals: 0xFF, 0b1010
- Numeric underscore separators: 1_000_000
- Version gating: features require xquery version "4.0"
XQTS: QT4 parser-dependent test sets (1898/2163, 87.7%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tree walker version checks for all XQ4-only constructs: when staticContext.getXQueryVersion() < 40, throw XPST0003 with a descriptive message. This ensures modules declaring xquery version "3.1" cannot use XQ4 syntax even if the parser somehow accepts it. Gated constructs: otherwise, pipeline (->), mapping arrow (=>!), ternary conditional (?? !!), keyword arguments, focus functions, string templates, while clause, default parameters, for-member, method call (=>?). Also add system property exist.xquery4.enabled (default true) to allow disabling XQ4 support entirely. When disabled, xquery version "4.0" declarations throw XPST0003. Addresses reviewer feedback from line-o on PR eXist-db#6139. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2a6fd06 to
4c342d1
Compare
|
[This response was co-authored with Claude Code. -Joe] CI state: 6/9 checks pass. The 3 remaining failures (ubuntu, windows, macOS integration) are pre-existing test hangs unrelated to this PR. Dependencies: Wave 3. Should merge after For full context on all 7.0 PRs and the merge order, see the Reviewer Guide. |
Updates ~75 FunctionParameterSequenceType attribute names across 64 fn: function files to match the XQuery 4.0 Functions and Operators specification. This enables keyword argument resolution (name := value syntax) to match parameter names correctly. Key patterns: - $arg -> $value (transformation functions: abs, ceiling, floor, round, etc.) - $arg -> $input (sequence functions: reverse, head, tail, count, etc.) - $arg -> $node (node functions: local-name, name, namespace-uri, root, etc.) - $arg -> $values (aggregate functions: sum, min, max, string-join) - $collation-uri -> $collation (all collation parameters) - $source-string -> $value (contains, starts-with, ends-with) - $string-1/$string-2 -> $value1/$value2 (compare, codepoint-equal) - $sequence -> $input, $function -> $action/$predicate (HOF functions) - $date/$time/$date-time/$duration -> $value (date/time extraction functions) - Various other renames per XQ4 F&O spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
af377f8 to
47a4cdf
Compare
XQuery 4.0 (PR1071) allows map constructors without the 'map' keyword:
{ "a": 1, "b": 2 }
and content expressions that merge maps into the constructor:
{ {"a": 1}, {"b": 2}, "c": 3 }
Changes:
- Version-gate bare map syntax with xq4Enabled in primaryExpr and
arrowFunctionSpecifier so { } only parses as a map in XQ4 mode
- Add MAP_CONTENT imaginary token for content expressions
- Replace mapAssignment with mapContentExpr rule supporting both
key:value entries and content expressions (wrapped in MAP_CONTENT)
- Update tree walker to handle MAP_CONTENT AST nodes
- Extend MapExpr with Entry interface, Mapping and ContentEntry types
to evaluate content expressions (must be maps, merged at runtime)
- Add XQSuite tests for empty content, merge, and XPTY0004 errors
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The .codacy/ directory specifies PMD 7.11.0 with rule references incompatible with Codacy's cloud engine. Removing it lets Codacy use its default configuration, matching all other branches. Also remove unused private method TryCatchExpression.getStackTrace() and its now-unused imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Override antlr.CharScanner#testLiteralsTable in the XQueryLexer body
section to add two fast paths to the per-NCNAME keyword lookup that
fires for every identifier in the source:
1. Shape filter. Every XQuery / XQUF / XQFT keyword is composed
entirely of lowercase ASCII letters (optionally separated by
ASCII hyphens), 2..25 characters long. Any NCNAME containing
an uppercase letter, digit, underscore, or any other character
cannot appear in the table, so the lookup is skipped outright.
2. HashMap mirror. The default implementation allocates an
ANTLRHashString wrapper on every call and looks it up in a
synchronized java.util.Hashtable. We mirror the keyword table
into an unsynchronized HashMap<String,Integer> on first use and
resolve hits via a single map.get(text) call.
Both paths preserve the existing semantics: a successful match
returns the same token type the inherited code would have returned;
a miss returns the supplied default ttype unchanged. If the cache
cannot be built (e.g. the security manager forbids reflective
access to ANTLRHashString), the lexer transparently falls back to
the inherited Hashtable lookup.
A -Dexist.xquery.lexer.legacyLiterals=true escape hatch reverts to
the inherited path for A/B comparisons and as a safety valve.
Also adds exist-core/src/test/java/org/exist/xquery/parser/ParserBenchmark.java,
a parse-vs-tree-walk microbenchmark over nine representative
queries (simple path, FLWOR with grouping, user functions,
typeswitch, module imports, element constructors, camelCase
application code) that can be run via:
mvn -pl exist-core test -Dtest=ParserBenchmark#runBenchmark \\
-Dexist.parserbench.run=true
Regression tests pass: XPathQueryTest (148), LexerTest (1),
CountExpressionTest (1), WindowClauseTest (12),
ReservedNamesConflictTest (1), ParserBenchmark#smoke (1),
xquery.xquery3.XQuery3Tests (978) -- 1142 tests, 0 failures,
0 errors.
| } | ||
| } | ||
|
|
||
| private static final Sample[] SAMPLES = { |
There was a problem hiding this comment.
move to top of class, see codacy
| "//book[@id = '123']/title/text()"), | ||
|
|
||
| new Sample("xpath-predicates", | ||
| "/library/section[@type='fiction']" + |
There was a problem hiding this comment.
I find text-blocka for anything longer than 2 lines, cleaner here
There was a problem hiding this comment.
can we have this as a separate PR and Issue please?
|
[This response was co-authored with Claude Code. -Joe] @duncdrum Thanks for the review.
|
Per duncdrum's review on PR eXist-db#6216, the Codacy config removal will be handled in a separate PR with its own issue, rather than mixed in with the XQuery 4.0 parser work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per duncdrum's review on PR eXist-db#6216: - Move the SAMPLES static-final array up so all class-level constants appear before the inner Sample type declaration (Codacy field-declarations-at-start-of-class). - Convert multi-line query strings to Java 15+ text blocks for readability (any sample longer than two lines). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two grammar/static-analysis improvements for prod-Annotation conformance:
1. AnnotatedFunctionTest (XQ3.1+): the spec allows annotations to prefix
FunctionTest in sequence-type positions, e.g.
() instance of %eg:x function(*)
() instance of %eg:x %eg:y(1) function(xs:integer) as xs:string
Previously the grammar only accepted annotations on function/variable
declarations and inline function expressions, so any annotated
FunctionTest produced an XPST0003 parse error. Adds a `MOD =>`
alternative to itemType, a new annotatedFunctionTest rule, and
ANNOTATED_FUNCTION_TEST imaginary token. The tree walker validates
each annotation for reserved-namespace use (XQST0045), then processes
the inner FunctionTest identically to the non-annotated form.
2. Reserved namespaces for XQST0045: per the XQ3.1/XQ4 spec the
annotation namespace list also covers the map and array function
namespaces and the XQuery 2012 namespace
(http://www.w3.org/2012/xquery), used for %public/%private and the
`xq` prefix. Adds the corresponding constants to Namespaces.java and
wires them through annotationValid().
Verified with the existing AnnotationsTest plus 7 new cases covering
annotations on AnyFunctionTest, TypedFunctionTest, multiple annotations,
braced-URI literals, and the three newly-reserved namespaces. XQuery3
suite (978 tests) regressed cleanly.
Projected XQTS prod-Annotation impact (QT4 catalog): the existing 22
assertion-style failures with `%anno function(*)` patterns and the four
declaration-side failures using map/array/xq namespaces flip from FAIL
to PASS, lifting prod-Annotation from ~55% to ~85%+ pass rate, well
above the 60% Phase 1 gate.
There was a problem hiding this comment.
Use text blocks for better readability
PMD flagged eval() in three new XQuery 4.0 expression classes (ForKeyValueExpr, ForMemberExpr, MethodCallOperator) above the 200 NPath threshold. Each method dispatches over input/binding shapes per the XQ4 spec and mirrors the structure of the existing FLWOR ForExpr.eval(). Reorganizing the branches obscures the spec-to-code mapping; suppress with a rationale comment instead. No behavior change. Other NPath violations on this branch are pre-existing in files only lightly touched (ForExpr, LetExpr, CastExpression, FunDeepEqual, etc.) and out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per XQuery 3.0+ A.1.1 ReservedFunctionNames, the unprefixed names attribute, comment, document-node, element, function, if, item, namespace-node, node, processing-instruction, schema-attribute, schema-element, switch, text, and typeswitch may not be used as the name of a function declaration. The ANTLR 2 parser previously accepted these names because eqName / ncnameOrKeyword recognises them as keywords usable in NCName positions. Reject them in functionDecl with XPST0003 immediately after parsing the name. empty-sequence, array, and map are intentionally excluded: per QT4 test function-decl-reserved-function-names-010a (XQ40+), empty-sequence is no longer reserved as a function name in XQuery 4.0; array and map were unreserved on the same path. Fixes 15 XQTS prod-FunctionDecl conformance failures (function-decl-reserved-function-names-002, -004, -006, -008, -010, -012, -014, -016, -018, -020, -024, -026, -028, -030, -032). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace string concatenation with Java 15+ text blocks for all XQuery query strings in AnnotationsTest. Also inline the TEST_VALUE_CONSTANT where it was only used once. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gating
Round 2 of Phase 2 Task 3. Brings prod-FunctionDecl from 173/225 (76.8%) to
209/225 (92.8%) — failures 52→16. Phase 2 gate (≤30 failures, ≥80% pass) met.
Parser (XQuery.g):
* Track parsed XQuery version (10/30/31/40) from versionDecl. The reserved
function-name rejection introduced in 9424f11 was too aggressive: it
fired in XQuery 1.0 mode, where attribute/comment/etc. were not yet
reserved. Gate the check on parsedVersion >= 30, restoring the 14 XQ10
reserved-function-names tests that round 1 broke.
* Recognize prefixed keyword arguments (`prefix:local := value`) as either
a single QNAME token or a BRACED_URI_LITERAL+NCNAME sequence. The lexer
collapses `p:x` into QNAME, which the original predicate
(`ncnameOrKeyword COLON …`) never matched, so XQ4 keyword calls like
`local:f(q:x := 3)` and `local:f(Q{ns}x := 3)` failed at parse time.
Tree walker (XQueryTree.g):
* Enforce XQST0148 — a required parameter cannot follow a parameter with
a default value.
* NamedFunctionReference: when the unprefixed function name resolves to
fn:, fall back to a no-namespace user-declared function (PR2200).
Function call resolution (FunctionFactory.java):
* Match keyword names in Clark notation so `p:x`, `q:x`, and `Q{ns}x`
all bind to the same parameter when their prefixes resolve to the
same namespace.
* Search every parameter position when matching a keyword (not just
those at/after the first keyword) so positional + keyword conflicts
are caught and raised as XPST0017.
* For user-defined functions, surface a null return from
resolveKeywordArguments as XPST0017 instead of silently falling back
to raw params (which evaluated kw args as positional).
* Stop filling unmatched required parameters with empty sequences.
A no-default param is required; if neither positional nor keyword
supplied it, return null and let the caller raise XPST0017.
* Forward references to unprefixed XQ4 functions: when the fn: namespace
has no matching built-in, use the no-namespace QName so a later user
declaration resolves through the forward-reference path.
declareFunction (XQueryContext.java):
* XQST0034: detect arity-range overlap between declarations with default
parameters. A function with k defaults is callable at arity
requiredCount..declaredArity, and any overlap with another overload
is ambiguous.
Error codes (ErrorCodes.java):
* Add XQST0148 (required-after-optional).
Test impact (prod-FunctionDecl, with companion runner version-prepend
fix in exist-xqts-runner):
Before: 173/225 (76.8%), 52 failures
After: 209/225 (92.8%), 16 failures (improvement: -36)
Remaining 16 failures are pre-existing static-analysis bugs (out-of-scope
variable detection in K-FunctionProlog-37/38), XQ4 downcast feature gaps
(K2-FunctionProlog-5a/6a), and PR2200 element-constructor cases that need
deeper namespace-resolution work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if ("4.0".equals(ver)) { | ||
| xq4Enabled = true; | ||
| parsedXQueryVersion = 40; | ||
| } else if ("3.1".equals(ver)) { | ||
| parsedXQueryVersion = 31; | ||
| } else if ("3.0".equals(ver)) { | ||
| parsedXQueryVersion = 30; | ||
| } else if ("1.0".equals(ver)) { | ||
| parsedXQueryVersion = 10; | ||
| } |
| * with empty sequence expressions for optional parameters. Returns null if | ||
| * resolution fails. | ||
| */ | ||
| private static @Nullable List<Expression> resolveKeywordArguments( |
There was a problem hiding this comment.
Address NPath complexity and unnecessary use of fully qualified name 'org.exist.xquery.value.FunctionParameterSequenceType' due to existing same package import 'org.exist.xquery.*' as noted by Codacy
- XQuery.g versionDecl: convert if/else-if chain to switch expression with arrow syntax per reinhapa's review - FunctionFactory.java: add explicit import for FunctionParameterSequenceType, replace 4 FQDN usages with simple class name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] @reinhapa Both items addressed in e7d51af:
164 tests pass, 0 regressions. |
Per the project convention, do not add @SuppressWarnings("PMD.NPathComplexity") annotations proactively. Let the reviewer decide whether to suppress or refactor. Removes the three annotations added in ee462fe (ForKeyValueExpr, ForMemberExpr, MethodCallOperator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if (name == null || name.indexOf(':') >= 0 || name.indexOf('{') >= 0) { | ||
| return false; | ||
| } | ||
| switch (name) { |
| implicitCalendar.setTimezone(TimeUtils.getInstance().getLocalTimezoneOffsetMinutes()); | ||
| } | ||
| // fill in fields from default reference; don't have to worry about weird combinations of fields being set, since we control that on creation | ||
| switch (getType()) { |
| } | ||
|
|
||
| private String clauseLabel() { | ||
| switch (clauseType) { |
| private boolean callPostEval() { | ||
| FLWORClause prev = getPreviousClause(); | ||
| while (prev != null) { | ||
| switch (prev.getType()) { |
| @Override | ||
| public void dump(final ExpressionDumper dumper) { | ||
| dumper.display("let "); | ||
| switch (mode) { |
| if (i > 0) dumper.display(", "); | ||
| dumper.display("$").display(varNames.get(i).getLocalPart()); | ||
| } | ||
| switch (mode) { |
| @Override | ||
| public String toString() { | ||
| final StringBuilder sb = new StringBuilder("let "); | ||
| switch (mode) { |
| if (i > 0) sb.append(", "); | ||
| sb.append("$").append(varNames.get(i).getLocalPart()); | ||
| } | ||
| switch (mode) { |
reinhapa requested converting 13 traditional switches to Java 21 switch expressions in PR eXist-db#6216. Also fix a Codacy parameter-reassign warning in FunctionFactory. Files: - XQuery.g: isReservedFunctionName -> switch expression with comma-separated case labels. - AbstractDateTimeValue.java, StringValue.java: convert fall-through statement switches to arrow syntax. - ForKeyValueExpr.java, ForMemberExpr.java: convert clauseLabel() and prev.getType() loop switches to switch expressions. - LocationStep.java: convert axis dispatch switch to switch expression assigning to result. - LetDestructureExpr.java: convert five switches on mode to switch expressions (getType, eval, dump, toString). - FunctionFactory.java: introduce effectiveParams local variable so the params parameter can stay final (Codacy reassign fix).
|
[This response was co-authored with Claude Code. -Joe] @reinhapa thanks for the review — all 13 items addressed in 0f93f1d:
Targeted tests pass: |
Summary
Adds XQuery 4.0 syntax support to the ANTLR 2 grammar with version gating: XQ4 syntax requires
xquery version "4.0"declaration. Includes a feature flag (exist.xquery4.enabled, default true) per community meeting decision.Addresses review feedback from @line-o on PR #6139.
What Changed
->), mapping arrow (=>!), otherwise, ternary (?? !!), focus functions, keyword args, string templates, braced if, while clause, default params, for-member, method calls in XQuery.g and XQueryTree.gxquery version "3.1"is declaredexist.xquery4.enabledsystem property (default true) — when false,xquery version "4.0"throws XPST0003Spec References
XQTS
Tests
Supersedes
Test plan
xquery version "3.1"modules (12 constructs gated)xquery version "4.0"modules🤖 Generated with Claude Code