[bugfix] Align fn: function signatures with W3C XQuery 4.0 PR197#6287
Closed
joewiz wants to merge 369 commits into
Closed
[bugfix] Align fn: function signatures with W3C XQuery 4.0 PR197#6287joewiz wants to merge 369 commits into
joewiz wants to merge 369 commits into
Conversation
Add parse="text" handling per XInclude spec section 4.2. When an xi:include element has parse="text", the referenced resource is read as plain text and included as character data (with XML special characters preserved for escaping during serialization). Four resource types are handled: - Binary documents in database: read raw bytes with specified encoding - XML documents in database: serialize to XML text (omit declaration) - In-memory documents: serialize to XML text - External URIs: read from URL connection Also reads the encoding attribute for charset selection (defaults to UTF-8), with graceful fallback for unsupported charsets. Architectural note: BaseX delegates XInclude entirely to Java's built-in SAXParserFactory.setXIncludeAware(true) at parse time — zero custom code. eXist's serialization-time approach is more powerful (works on stored documents) but requires explicit feature implementation. A complementary parse-time option could be added as a future enhancement. Test results: 47/149 pass (31.5%), up from 42/149 (28.2%). +5 from parse="text" support. Existing XIncludeSerializerTest: 7/7 pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 4 XInclude tests from the XProc 3.0 test suite covering scenarios not in W3C suites: parse="text" on XML documents, fixup-xml-lang parameter. Add parse attribute validation to XIncludeFilter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved grammar conflict: keep both XQUF and XQFT keyword blocks in the reservedKeywords rule (removed duplicate 'least' from XQFT).
Resolved 42 conflicts across 15 files: - Grammars (XQuery.g, XQueryTree.g): used next-v2 versions which have all features correctly merged (XQUF + XQFT + XQ4 + expression chain) - CastExpression, CastableExpression, Type, SequenceType, UserDefinedFunction, DynamicTypeCheck: took theirs (XQ4 type system) - Constants: took theirs (complete XQ4 axis numbering) - DoubleValue, AbstractDateTimeValue: took ours (compliance error codes) - LocationStep, FilterExprAM: took ours (already has v2/xq4-axes/filter-expr-am) - StaticXQueryException: took ours (additional constructors from compliance fixes) - FunctionFactory: removed duplicate variable declaration
Grammar conflicts resolved: kept ours (already has decimal-format from next-v2 grammar base).
Resolved 16 conflicts: - Grammar: took ours (already has all features from next-v2) - Saxon 12 regex files (FunAnalyzeString, FunMatches, FunReplace, RegexUtil): took ours (Saxon 12 API) - XQ4 function files: took theirs (new XQ4 function implementations) - pom.xml: took theirs (includes markup-blitz dependency)
…espaces, namespace-prefixes (attrs), timezones, map-order
XQ4 PR320/PR1855 introduced an options-map third argument to
fn:deep-equal. The DeepEqualOptions parser already recognized the
option keys, but several were stored as flags without ever being
consulted by the comparison engine. This commit wires them up:
- base-uri: compare element base URIs (xml:base / inherited)
- in-scope-namespaces: compare in-scope NS bindings as sets,
walking ancestor xmlns declarations
- namespace-prefixes: extend prefix check to attributes (was only
applied to elements). Use a fallback that parses the prefix from
nodeName when getPrefix() returns null/empty
- timezones: when set, two date/time atomics with different
explicit timezones (or one missing) are not deep-equal even if
they represent the same instant
- map-order: iterate both maps in their recorded insertion order
(PR1703) and compare keys position-wise
Two collateral fixes:
- items-equal: drop the eager arity check at parse time. The spec
permits any function-typed value (the test deep-equal-40-items-equal-004
passes true#0 because length-mismatched sequences must return false
before invoking the callback). Arity is now validated lazily.
- Unsupported collation in the 3-arg variant now raises FOCH0002
(the runtime error) rather than letting XQST0076 leak through
from getCollator. Spec accepts FOCH0002 here.
Also fix deep-equal-options-test.xq: a library module cannot
declare local:helper (XQST0048) — renamed to det:helper.
XQTS QT4 fn-deep-equal: 33 -> 26 F+E (9 tests fixed: base-uri-003,
in-scope-namespaces-003, timezones-003, items-equal-{004,005,006,008},
normalize-unicode-004, map-order-003; 2 regressions in
whitespace-{009,031} that need follow-up).
JUnit DeepEqualTest: 63/63 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…odes
* TryCatchExpression: bind XQuery 4.0 $err:map (PR493) and $err:stack-trace
(PR1470/PR1599) inside catch clauses. err:map is a map(xs:string, item()*)
containing all standard error properties (code, description, value, module,
line-number, column-number, additional, stack-trace).
* UntypedValueCheck: raise XPTY0117 when implicitly coercing xs:untypedAtomic
to a namespace-sensitive target type (xs:QName, xs:NOTATION) during
function-call argument coercion (XPath F&O 3.1 §19.1).
* CastExpression: allow xs:untypedAtomic as a source for cast as xs:QName
(XQ30+ relaxed the rule — lexical errors raise FORG0001 via QNameValue).
* IntegerValue: revert XQ4 hex/binary/underscore parsing for runtime
string-to-integer casts. Those literal extensions belong only to the parser
token path (XQueryTree.g uses the BigInteger constructor) — applying them
to xs:integer("0x0") accepted invalid lexical forms.
Reduces QT4 prod-TryCatchExpr failures from 34 to ~28 (covers $err:map and
$err:stack-trace tests). Net improvement on prod-CastExpr from XPTY0117
alignment and untypedAtomic→QName cast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move DOCTYPE emission rules into XHTMLWriter so both XHTML4 and HTML4 share the same logic; consolidate the previously diverging XHTML5Writer override. Per W3C XSLT and XQuery Serialization 3.1 sections 7.1 and 7.2: - doctype-system set: emit DOCTYPE PUBLIC/SYSTEM - doctype-system absent, html method, doctype-public set: emit DOCTYPE PUBLIC - doctype-system absent, html-version >= 5: emit <!DOCTYPE html> - otherwise: no DOCTYPE Previously XHTMLWriter inherited XMLWriter's writeDoctype which emitted a DOCTYPE whenever either id was set, causing xhtml-25 to emit a stray DOCTYPE PUBLIC. XHTML5Writer's override suppressed <!DOCTYPE html> when doctype-public was set without doctype-system, which broke xhtml-27. isHtmlMethod and isHtml5Version are now protected (not private), and isHtml5Version reads html-version first, falling back to version per the W3C spec note for html method. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Phase 2 conformance gaps in two related date/time function sets.
fn:parse-ietf-date (76.2% -> 100%, 105/105):
- Default missing timezone to UTC (per spec note: no timezone == GMT).
- Time parser no longer greedily consumes year digits as a timezone in
asctime form (e.g. "Aug 20 19:36 2014").
- Timezone offset parser handles 1, 2, 3, 4-digit forms ("-5", "-05",
"-500", "-0500"), trailing colons ("-05:"), and ":mm" with no minutes
("Feb-02 02:02-02: 02"); no longer eats trailing whitespace+content.
- Recognise lowercase TZ names ("gmt", "utc"); recognise lowercase day
and month names already worked.
- Optional "(TZNAME)" comment after offsets is parsed and validated;
empty parens / unknown TZ in parens raise FORG0010.
- Day-name without trailing whitespace ("Wed,20 Aug ...") now errors.
- dsep requires at least whitespace or hyphen; "20Aug" / "Aug2014" now
error as expected.
- Handle 24:00 without seconds as midnight at end of day.
- "." with no fractional digits now errors (errs27).
fn:build-dateTime (64.8% -> 95.8%, 68/71):
- Strict field combination validation: empty record, time without all of
hours/minutes/seconds, year+day without month, time fields with
incomplete date components -- all raise FODT0005 with clear messages.
- Numeric coercion: integer fields accept xs:integer or xs:decimal that
is exactly an integer; xs:double, fractional decimals, NaN/Infinity
raise XPTY0004. Untyped/node values are parsed as integers (test
date-without-timezone-from-nodes).
- Seconds field accepts xs:integer / xs:decimal / finite xs:double or
xs:float; rejects NaN/Infinity (XPTY0004).
- Timezone accepts xs:duration too (rejecting year/month parts);
validates +-14:00 range and whole-minute offsets (FODT0006).
- Calendar-day validity (28/29/30/31, leap years) checked up-front and
reported as FODT0006 instead of bubbling up FORG0001 from the lexical
parser. Seconds range 0..<60 also FODT0006.
- Year formatting handles year 0 ("0000") and negative years ("-0001")
per XSD 1.1 representation.
- When a timezone is supplied with a full dateTime, return
xs:dateTimeStamp instead of xs:dateTime.
XQTS QT4 deltas (antlr parser):
| Set | Before | After |
| ------------------ | -------------------- | -------------------- |
| fn-parse-ietf-date | 80/105 (76.2%) | 105/105 (100%) |
| fn-build-dateTime | 46/71 (64.8%) | 68/71 (95.8%) |
Remaining build-dateTime failures: year-zero formatting (XSD 1.0
javax.xml.datatype rejection) and one XPST0017 case that depends on
removing the xq31 2-arg overload.
XQueryContext.checkOptions resolved namespace prefixes only from
inScopeNamespaces, which contains element-constructor scoped namespaces
but NOT prologue declarations like `declare namespace p = "..."`.
Prologue namespaces are stored in staticNamespaces; getURIForPrefix is
the canonical accessor that consults inScope, inherited, and static
maps in turn.
This caused prefixed names in serialization options (e.g.
`declare option output:cdata-section-elements "p:b"`) to resolve their
prefix to a null URI, producing the QName "{null}b" which never matched
real elements during serialization.
Fixes XQTS QT4: method-xhtml -18, -19a, -19b, -19c (cdata-section-elements
on prefixed elements) and method-xml K2-Serialization-30. method-xhtml
now at 81.1% (43/53), method-xml at 80.9% (38/47), both above Phase 2 80%
gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related JSON serialization fixes that bring eXist into line with W3C XSLT and XQuery Serialization 3.1 § 10: 1. Single-element/document sequences with method=json no longer route through the legacy XML-to-JSON conversion writer (which renders `<e/>` as the JSON literal `null`). They now go through the W3C-compliant JSONSerializer, which serializes the node as XML and wraps the result in a JSON string. Callers that depended on the legacy XML-to-JSON object graph can opt back in via the new `exist:legacy-json-conversion` parameter (default "no"). 2. Default escape-solidus is now "yes". The XQ 3.1 spec did not define the parameter, but mainstream serializers (and the XQTS tests authored against XQ 3.1) assume `\/` escaping. PR534 adds the parameter for XQ 4.0 with default "no"; until we plumb XQuery version awareness, defaulting to "yes" preserves the legacy XQTS tests and lets XQ 4.0 callers explicitly opt out via `output:escape-solidus "no"`. Tests json-75 and json-76 already set the value explicitly. Fixes XQTS QT4: method-json -27, -29, -51, -56, -61, -64..-67, -69..-72 (13 new passes). method-json now at 84.4% (12/77 fail), passing the Phase 2 80%/30-fail gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier commit 5c407e5 routed every single-element/document method=json sequence through the W3C JSONSerializer. That was correct for QT4 XQTS serialization tests but broke eXist tests that exercise the legacy XML-to-JSON conversion (json:literal / json:array / json:value / json:name elements and attributes). Refine the routing: * If the input root carries the `json:` prefix on any element or attribute (the legacy convention's marker), route through the legacy XML-to-JSON writer. * Otherwise route through the W3C-compliant JSONSerializer. * `legacy-json-conversion="yes"|"no"` (the eXist-specific parameter, registered with W3CParameterConvention as a boolean string so it is settable from `serialize` parameter maps and XML serialization-parameters elements alike) explicitly overrides the heuristic. Update the legacy-only tests in `exist-core/src/test/xquery/json.xml` and `EvalTest#evalAndSerializeJson` to opt in via the new parameter: those tests build vanilla XML without the json: prefix but explicitly expect the legacy XML-to-JSON object graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two PI serialization rules from W3C XSLT and XQuery Serialization: * HTML method (pre-HTML5, version < 5.0): processing instructions are serialized as `<?target data>` with no closing `?>` — § 7.1.5 of the XSLT and XQuery Serialization 3.1 spec. Previously XHTMLWriter inherited XML's `<?target data?>` form regardless of method. * HTML5 method (version 5.0): per QT4 PR2372, since HTML5 has no PI syntax, the serializer renders processing instructions as comments of the form `<!--?target data?-->`, matching the HTML5 parser's coercion of `<?...?>` content. Previously HTML5Writer emitted the pre-HTML5 form. Fixes XQTS QT4: method-html -48, -58, -59a (3 new passes). The XQ 3.0/3.1-only -59 case now regresses because the XQTS runner prepends `xquery version "4.0"` to every test and the new HTML5 PI form is the XQ 4.0 normative output; the older `<?pi data>` form survives only under XQ 3.x. Net for method-html: 24 → 22 fails (65.2% → 68.1%). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename local: function declarations to use the module's target namespace prefix, as required by XQuery 3.1 §4.18 (XQST0048). The XQST0048 check in ExternalModuleImpl.declareFunction() correctly rejects these. - xquery4/fnXQuery40.xql: local:add → t:add, local:greet → t:greet, local:choice-param → t:choice-param - xquery3/fnXQuery40.xql: local:annotated-fn → t:annotated-fn - unzip-tests.xql: local:entry-data → uz:entry-data, local:entry-filter → uz:entry-filter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rder The XQuery 4.0 set operators (union, except, intersect) rejected operands of type json-node() because their type checks used Type.subTypeOf(NODE) rather than Type.isNodeType(). JSON_NODE is its own subtree under ITEM in eXist's type lattice, not a NODE subtype. Switch the checks to Type.isNodeType so JNode operands are accepted. JNodes also lacked equals/hashCode and a document-order comparator, so $in/a union $in/a deduplicated to two nodes (distinct Java instances) and ($in/d, $in/a) union ($in/c, $in/d) preserved source order rather than document order. Add path-based equals/hashCode (root identity for roots; parent + key + position for descendants) and a static compareDocumentOrder method that walks to the lowest common ancestor. Track member iteration order in the position field so document-order sort is meaningful for object members; FnJNode.evalJposition still returns 1 for object members per the W3C XQ4 fn:jposition spec (member identity within an object is the key, not position). ValueSequence.removeDuplicateNodes recognized only NODE subtypes; widen to also dedup JSON_NODE items via a HashMap (since ItemComparator's TreeMap path can't order JNodes). Union and Except invoke an explicit JNode-aware sort after dedup when the result contains JSON nodes, because the existing InMemoryNodeComparator (FastQSort) handles only XML memtree nodes. QT4 XQTS impact (antlr parser): op-union 19 fails -> 1 fail (97.8% -> 98.9%) op-except 17 fails -> 1 fail (97.6% -> 98.8%) op-intersect 17 fails -> 2 fails Closes Phase 2.5 set-ops portion of the post-90 fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…JS0001 trailing tokens
Several QT4 fn:json-to-xml conformance gaps:
1. The duplicates default for json-to-xml was use-first; per the W3C
spec (and QT4 test json-to-xml-018) it must be retain — XML output
can represent duplicates as repeated child elements where a map
could not. parse-json/json-doc continue to default to use-first.
2. The validation for the duplicates option rejected use-last and
retain for json-to-xml. In XQuery 4.0 all four values (reject,
use-first, use-last, retain) are valid for json-to-xml.
3. Jackson silently stops at end-of-first-value, so json-to-xml('{}extra')
succeeded instead of raising FOJS0001. Check parser.nextToken()
after the top-level value and raise FOJS0001 on extra content.
4. The number-parser option allows any item that atomizes to xs:numeric
or xs:string. Previously a function item (like fn { fn() { $n } })
was treated as an empty result; now it raises FOTY0013. Also accept
nodes (atomized via getStringValue) and 0-arity functions (called
with no arguments).
QT4 XQTS impact:
fn-json-to-xml 15 fails -> 2 fails (98.2% pass)
Closes Phase 2.5 fn-json-to-xml portion of the post-90 fixes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When XQueryTreeParser created its staticContext via the XQueryContext copy constructor, only staticNamespaces were copied. Host-supplied namespaces declared by an embedding caller via declareInScopeNamespace (e.g., the XQTS runner forwarding a test-set environment's <namespace prefix="..."/> declarations) were lost, causing XPST0081 at parse time for tests that referenced those prefixes (e.g. atomic:root, j:string). Carry forward inScopeNamespaces in the copy constructor so they remain visible to static analysis on the copied context. QT4 XQTS impact (with the runner-side fix that propagates testCase env namespaces to assertion queries): op-union (atomic: prefix tests now resolve) op-except (atomic: prefix tests now resolve) fn-json-to-xml (j: prefix in instance-of assertions now resolves) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 7758007 accidentally removed the compareAttributes(a, b) call from compareElements when adding the new base-uri and in-scope-namespaces option blocks. As a result, two elements with different attribute values were treated as deep-equal whenever their names, child contents, and (default-off) base-uri/in-scope options matched. Restore the call after the option checks. With this fix: - xquery3.deep-equal-options-test: whitespace-strip-attr and whitespace-normalize-attr-different (assertFalse on differing attribute values) pass again - XQTS QT4 fn-deep-equal: 33 -> 26 F+E, no regressions vs baseline, with namespace-prefixes-004 now also passing (the attribute prefix check from the parent commit was unreachable until now) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When include-content-type=yes (the default), the serializer auto-emits a Content-Type / charset meta tag as the first child of <head>. If the input also contains an explicit `<meta charset>` or `<meta http-equiv="Content-Type">`, we ended up writing two metas in the output, which fails the XQTS regex checks of the form `not(meta.*meta)` and breaks W3C HTML/XHTML serialization compliance (PR2372). The fix diverts each candidate meta inside <head> to a scratch buffer at startElement time. attribute() inspects the captured attributes; if any of them is `charset` or `http-equiv="Content-Type"` (case-insensitive), the buffered meta is dropped at endElement time so the auto-emitted meta stands as the single Content-Type / charset element. Otherwise the buffer is flushed verbatim, preserving regular meta elements like `<meta name="description">`. HTML5Writer uses its own attribute() and short-circuits endElement() for void elements, so the dedup hooks (`noteMetaAttribute`, `endMetaBuffer`) are exposed as protected and called from HTML5Writer to keep the HTML5 output method on the same code path as HTML4/XHTML. Fixes XQTS QT4: method-html -34, -37a, -60 (3 new passes); method-xhtml -34, -37, -37a, -68 (4 new passes). method-xhtml now at 88.7% (6/53 fail). method-html at 72.5% (19/69 fail). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings method-html from 22F (68.1%) past the Phase 2 gate (≥80% AND ≤30F) to 10F (81.2%) on the QT4 serialization test set, with no regressions in method-json/xhtml/xml or in unit tests touching the HTML serializers. Five W3C XSLT/XQuery Serialization 3.1 § 7 conformance fixes: - HTML5Writer.attribute(): case-insensitive boolean attribute minimization per § 7.2.2 — `<option selected="SELECTED">` now serializes as `<option selected>` (Serialization-html-13). The matcher accepts empty values too. - XHTMLWriter / HTML5Writer attribute(): apply escape-uri-attributes (default `yes`) per § 7.2.5 to URI-valued attributes (a/@href, img/@src, link/@href, etc.). Only non-ASCII codepoints are %-encoded to UTF-8 — ASCII (incl. literal space) passes through to avoid double-encoding existing escape sequences. (Serialization-html-43, -44) - XHTMLWriter.shouldUseCdataSections(): for the html method, cdata-section- elements is ignored for HTML-namespaced elements but DOES apply to foreign content (§ 7.2.7). Foreign-namespaced elements bypass the xdm-serialization gate. (Serialization-html-18) - HTML5Writer.closeStartTag(): foreign content embedded in HTML5 is self-closed with `/>` instead of the `></tag>` expanded form, so consumers can re-parse the foreign block as XML. (Serialization-html-6) - HTML5Writer.namespace(): XHTML namespace declarations are still suppressed (HTML5 parser puts elements in the HTML namespace implicitly), but foreign-content namespace declarations are now emitted so SVG/MathML/custom-XML round-trip. (Serialization-html-19a-c) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nCall Bring QT4 prod-DynamicFunctionCall from 49.3% to 81.6% (40 -> 14 F+E) by implementing function-coercion semantics for record-typed parameters (XQ4 PR1132/PR1501) and adding W3C Schema list-type cast support. CastExpression: handle xs:NMTOKENS, xs:IDREFS, xs:ENTITIES by splitting the source string on whitespace and producing a sequence of items typed as the corresponding atomic item type. Previously a cast to any list type fell through to StringValue.convertTo's default branch and threw XPTY0004. DynamicTypeCheck: factor the per-item function-coercion logic into a public static helper coerceAtomicItem so other code paths can reuse it without going through an Expression wrapper. UserDefinedFunction: before validating a record-typed parameter, walk the map's declared fields and apply function coercion to each value: atomize node/array values, cast untypedAtomic to the declared type, apply numeric promotion and XQ4 implicit casting/relabeling, and try each alternative in choice (union) field types in declaration order. Nested record types recurse. The coerced map is then bound to the parameter so the function body sees the typed values. SequenceType.checkType(Sequence): also iterate items for record types and structurally-typed maps/arrays. The previous primary-type subtype shortcut was unsound for these (a value of type map(*) -- a parent of RECORD in the type hierarchy -- would erroneously satisfy a record type). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per XQuery 3.1/4.0 spec, all xmlns declarations on a direct element
constructor are statically in scope for the entire element, including
attribute value expressions. The tree walker previously declared
namespaces only as it encountered each xmlns attribute, so attribute
value expressions referencing prefixes declared LATER on the same
element would fail with XPST0081.
Add a pre-pass that scans all attribute children of an ELEMENT node
for literal xmlns/xmlns:prefix declarations and declares them on the
static context before walking attribute value expressions.
Lifts prod-DirElemContent.namespace from 75.1% to 83.5%
(33 -> 22 failures) on QT4 XQTS. Fixes static resolution of QNames
like p:integer in <e a="{1 instance of p:integer}" xmlns:p="..."/>
where xmlns appears after the attribute that references it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce QT4 fn-format-number failures from 37 to 11 by addressing edge
cases in picture analysis, map-form options, and exponent formatting:
* Irregular grouping: per XPath 4.0, when integer-part separator
positions are irregular, place separators only at the explicit
positions instead of extrapolating the rightmost group size. Track
the integer-part digit count to bound the regularity check at
digit-count - 1, which makes pictures like '####,##' irregular while
keeping '##,##,##' regular.
* Picture analysis after the exponent: passive characters (letters,
punctuation) are now allowed as suffix material, while active
characters (% and per-mille) raise FODF1310. A second occurrence of
the exponent-separator that would itself be active (preceded and
followed by digits) is rejected. Pictures like '9.9999e99end' work;
'9.9999e999%' errors.
* Adjusted variables: when minimum-fractional-part-size is forced to 1
by the integer-zero/maximum-frac-zero/exponent rule, mark the
sub-picture as having a decimal separator so step 12 does not strip
it from output ('#e0' formats 0.2 as '0.2e0').
* Exponent rendering: convert ASCII exponent digits to the
decimal-digit family using zero-digit; support multi-character
minus-sign on negative exponents.
* Multi-character minus-sign: change DecimalFormat.minusSign from int
to String so renditions like 'minus ' work in negative-prefix and
exponent-sign output. Update both ANTLR and recursive-descent parser
to pass the value through unchanged for declare decimal-format.
* Map-form options:
- Accept xs:QName (or string) for format-name, including via the
bare-string overload, so namespaced decimal formats resolve via
the QName item directly.
- Atomize property values (sequences and singleton arrays) before
applying them.
- Reject multi-codepoint values for single-character properties
(FODF1290), unless the value uses the X:rendition pattern.
- After applying overrides, validate distinctness of all eight
picture-string properties, that zero-digit is a Unicode digit
with numeric value zero, and that digit is not a member of the
decimal digit family. Each violation raises FODF1290.
* New error code: ErrorCodes.FODF1290 (Invalid decimal format
property value).
Closes the fn-format-number portion of post-90-fixes phase 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce QT4 prod-WindowClause failures from 35 to 11 and prod-LetClause from 37 to 27 by completing the parser/runtime support that the tree walker side already had: * Grammar: make WindowStartCondition and WindowEndCondition individually optional in windowClause, and make the "when ExprSingle" guard optional inside each (XQ4 PR483). Sliding windows still require an end clause; that constraint is enforced in the tree walker. * WindowCondition / WindowExpr: tolerate a null whenExpression on either the start or the end condition. A missing "when" defaults to true() during analyze() and eval(), and toString() omits the "when ..." fragment entirely so dump output stays readable. * LetExpr: when the variable has an explicit atomic SequenceType and XQuery version >= 4.0, run a function-conversion pass over the bound value before the body executes (XQ4 PR1131). New coerceAtomicSequence() casts each item to the declared type via atomize().convertTo(), promoting xs:integer/decimal/float to xs:double, casting xs:untypedAtomic and xs:anyURI to the target atomic type, and falling back to the existing XPTY0004 path if any item cannot be converted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses misc-Subtyping XQTS gaps. misc-Subtyping QT4 fail count drops from 45 to 26 (153 tests, 41 skipped, 112 active; pass rate 75%). Parser changes (XQuery.g, XQueryTree.g): - recordFieldDecl: suppress the optional QUESTION token from the AST so the tree walker no longer sees a stray '?' node after a record field with no type clause. - Tree walker: allow xs:error in the atomic-type position. xs:error is defined as a builtin under ANY_SIMPLE_TYPE; per XQuery 4.0 it is a legitimate sequence type (its value space is empty, so xs:error* matches only the empty sequence). - documentTest: accept document-node(*) as XQuery 4.0 short form for document-node(element(*)). SequenceType subtype rules (SequenceType.java#isSubtypeOf): - Element/attribute kind tests now compare nodeName: when sup names a specific element/attribute, sub must name the same one. Previously element(*) was reported as a subtype of element(a). - Records now subtype-check structurally on declared fields per XQuery 4.0 Records: required fields of sup must exist (and be required) in sub with sub's field type subtype-of sup's, and sub may not declare extra fields unless sup is extensible. - map(K, V) subtype check now allows records (RECORD <= MAP_ITEM) to flow through, treating an untyped record as map(xs:string, item()*). - function-shape conversion for maps and arrays now uses the declared K/V types: map(K, V) acts as function(xs:anyAtomicType) as V? (atomic-coerced key per XQ4 PR1501; lookup miss widens cardinality); array(T) acts as function(xs:integer) as T. Records flow through the map branch via Type.subTypeOf(sub.primaryType, MAP_ITEM). XQTS misc-Subtyping (QT4): - Before: 45 fails, 41 skipped, 67 passes (59.8%) - After: 26 fails, 41 skipped, 86 passes (76.8%) Remaining failures are XQuery 4.0 features outside this change set: - element/attribute multi-name tests (element(a|b)) and namespace wildcards (element(p1:*), element(*:a)) -- 14 tests - element-type tracking (element(a, xs:integer) covariance) -- 4 tests - gnode() as supertype of node()|jnode() -- 3 tests - document-node(QName) bare short form, document-node(*) vs () -- 2 - union-type widening (xs:long|xs:int subtype of xs:integer) -- 3
… 4.0 spec
W3C XQuery 4.0 PR197 specifies the keyword-argument names for built-in
functions. eXist's signatures used the legacy 3.1-era names:
fn:json-doc: eXist used $href -> spec says source
fn:parse-json: eXist used $json-text -> spec says value
fn:json-to-xml: eXist used $json-text -> spec says value
These names are visible to callers via XQ4 keyword-argument syntax
(name := value), so they must match the spec for keyword calls to
resolve. Positional-call behavior is unchanged.
Confirmed via QT4 misc-BuiltInKeywords XQTS:
Keywords-fn-json-doc-1 pass (was XPST0017)
Keywords-fn-parse-json-1 pass (was XPST0017)
Keywords-fn-json-to-xml-1 still fails on a separate parser-level
issue in its instance-of clause
(document-node(fn:*) wildcard).
Net misc-BuiltInKeywords: 83 -> 79 F+E (72.0% -> 72.8%). The remaining
79 failures need a per-function W3C-4.0 signature audit and new XQ4
type-syntax parser features (record types, local union types, document-
node element wildcards), tracked under separate phase2-* taskings.
…R197 Audit and fix FunctionSignature definitions against the QT4CG XQuery 4.0 spec so partial-application instance-of tests succeed. The QT4 keyword test catalog (misc-BuiltInKeywords) checks that fn:foo(arg := ?) instance of function(...) as ... matches the signature declared by W3C; before this commit eXist's signatures diverged in parameter names, cardinalities, and return types. Common patterns fixed: - collation parameter widened from xs:string to xs:string? (fn:contains, fn:ends-with, fn:starts-with, fn:compare, fn:distinct-values, fn:index-of, fn:substring-before, fn:substring-after, fn:collation-key, fn:string-join's separator, array:index-of) - options map parameter widened to map(*)? (fn:doc, fn:doc-available, fn:csv-to-arrays, fn:parse-csv, fn:csv-to-xml, fn:csv-doc, fn:parse-xml, fn:parse-xml-fragment, fn:path) - length-style optional trailing parameter widened to ? (fn:substring's length, fn:subsequence's length, array:subarray's length, array:build's action) - start type widened to xs:numeric for fn:subsequence - base param of fn:resolve-uri is xs:string? - fn:seconds value is xs:decimal? - fn:unix-dateTime value is xs:nonNegativeInteger?, returns dateTimeStamp - typed function/array/record/map returns: fn:op returns fn(item()*, item()*) as item()* fn:invisible-xml returns fn(xs:string) as item() fn:analyze-string returns element(fn:analyze-string-result) fn:element-to-map returns map(xs:string, item()?)? fn:function-annotations returns map(xs:QName, xs:anyAtomicType*)* fn:csv-to-arrays returns array(xs:string)* fn:divide-decimals returns record(quotient/remainder) fn:in-scope-namespaces returns map(xs:NCName, xs:anyURI) fn:transitive-closure returns node()* array:members returns record(value as item()*)* fn:transform returns map(*) fn:collation accepts map(*) and returns xs:string Type-checker support added: - SequenceType.isSubtypeOf now handles a choice (union) type on the sub side: every alternative must be a subtype of the supertype. This unblocks date/time accessors (fn:year-from-dateTime etc.) and fn:char where the spec types are union types but eXist uses a single broader primary type. - Bare map(*) is treated as map(xs:anyAtomicType, item()*) in subtype checks (was xs:string, which contradicts the W3C spec); records flow through with an xs:string key fallback because record keys are always strings. - Bare array(*) is now treated as array(item()*); array(*) is a subtype of array(item()*). Implementation tweaks accompanying the signature changes: - CollatingFunction.getCollator handles an empty xs:string? collation argument by returning the default collator. - FunSubstring, FunSubSequence, FunAnalyzeString, ArrayFunction handle an empty optional-length argument by behaving as the no-length form. - ArrayBuild handles an empty action argument by returning the input unchanged (identity-like). - FunResolveURI handles an empty base-URI argument by falling back to the static base URI. XQTS misc-BuiltInKeywords pass rate: 72.9% -> 89.5% (76 -> 31 fails). Remaining failures are net-new XQ4 functions (j-tree, j-key, j-value, j-position, system-properties, build-dateTime, etc.), unrecognised record types (fn:dateTime-record, fn:parsed-csv-structure-record, etc.), parser support for record/document-node wildcards in instance-of expressions, and a fn:matches partial-application internal error — all out of scope for this signature audit. Other affected XQTS sets (no regressions, only improvements): fn-collation-key 10 -> 7, fn-string-join 2 -> 0, fn-resolve-uri 3 -> 1, fn-substring 1 -> 0, fn-substring-before 3 -> 1, fn-substring-after 2 -> 0, misc-Subtyping 26 -> 25. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
Opened in error — this PR's branch was forked from the next-v3 integration branch (360+ commits ahead of develop), making it appear as 389 commits. The one substantive commit ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Audit and fix
FunctionSignaturedefinitions acrossorg.exist.xquery.functions.fnand.arrayagainst the W3C XQuery 4.0 spec (PR197). The QT4 keyword test set (misc-BuiltInKeywords) checks thatfn:foo(arg := ?) instance of function(...) as ...matches the function-item signature W3C declares; eXist's signatures previously diverged from the spec in parameter cardinalities, parameter types, and return-type details, causing 49 partial-application instance-of checks to fail.XQTS impact
misc-BuiltInKeywordsmisc-Subtypingfn-collation-keyfn-string-joinfn-resolve-urifn-substringfn-substring-beforefn-substring-aftermisc-BuiltInKeywordspass rate: 72.9% → 89.5% (no regressions across audited sets).What changed
Signature changes (per W3C PR197)
xs:string(exactly-one) toxs:string?:fn:contains,fn:ends-with,fn:starts-with,fn:compare,fn:distinct-values,fn:index-of,fn:substring-before,fn:substring-after,fn:collation-key,fn:string-join'sseparator,array:index-of.map(*)?:fn:doc,fn:doc-available,fn:csv-to-arrays,fn:parse-csv,fn:csv-to-xml,fn:csv-doc,fn:parse-xml,fn:parse-xml-fragment,fn:path.?:fn:substringlength,fn:subsequencelength (also start/length type nowxs:numeric),array:subarraylength,array:buildaction.fn:resolve-uribase param isxs:string?.fn:secondsvalue isxs:decimal?.fn:unix-dateTimevalue isxs:nonNegativeInteger?, return type isxs:dateTimeStamp.setFunctionParamTypes/setRecordType:fn:op→fn(item()*, item()*) as item()*fn:invisible-xml→fn(xs:string) as item()fn:analyze-string→element(fn:analyze-string-result)fn:element-to-map→map(xs:string, item()?)?fn:function-annotations→map(xs:QName, xs:anyAtomicType*)*fn:csv-to-arrays→array(xs:string)*fn:divide-decimals→record(quotient as xs:decimal, remainder as xs:decimal)fn:in-scope-namespaces→map(xs:NCName, xs:anyURI)fn:transitive-closure→node()*array:members→record(value as item()*)*fn:transform→map(*)fn:collationacceptsmap(*)and returnsxs:stringType-checker support (
SequenceType.isSubtypeOf)sup. Unblocks date/time accessors (fn:year-from-dateTimeetc.) andfn:charwhose spec parameter types are unions but eXist uses a broader primary type.map(*)is treated asmap(xs:anyAtomicType, item()*)in subtype checks (wasmap(xs:string, item()*), which contradicts XQuery 4.0). Records flow through with anxs:stringkey fallback because record keys are always strings — this preserves the existingsubtyping-110(record(a) ⊑ map(xs:string, item()*)) test.array(*)is now treated asarray(item()*); an untyped array is a subtype ofarray(item()*).Implementation adjustments accompanying the signature changes
CollatingFunction.getCollatortreats an emptyxs:string?collation argument as a request for the default collator.FunSubstring,FunSubSequence,FunAnalyzeString,ArrayFunction.subArrayhandle an empty optional-length argument as the no-length form.ArrayBuildtreats an emptyactionargument as identity (each input item becomes one array member).FunResolveURIfalls back to the static base URI when the explicit base argument is empty.Spec references
qt4tests/misc/BuiltInKeywords.xmlOut of scope (remaining
misc-BuiltInKeywordsfailures)The remaining 31 failures fall into categories that need substantive feature work and are tracked separately:
XPST0017arity mismatches — net-new XQ4 functions (fn:jtree,fn:jvalue,fn:jposition,fn:jkey,fn:system-properties,fn:build-dateTime,fn:xsd-validator, the 3-argfn:round,fn:unparsed-text*overloads,fn:element-with-id,map:build).XPST0051unknown record types — XQ4 record-typed return values (fn:dateTime-record,fn:parsed-csv-structure-record,fn:element-conversion-plan-record, etc.) need parser/registry support.XPST0003parse errors — XQ4 syntax (document-node(prefix:*),record(...)ininstance of) not yet supported in the grammar.fn:matchesinternal error — pre-existing partial-application bug in placeholder expansion.Test plan
misc-BuiltInKeywordsXQTS — 76 → 31 failsmisc-SubtypingXQTS — 26 → 25 failsfn-collation-key,fn-string-join,fn-resolve-uri,fn-substring,fn-substring-before,fn-substring-after— all improved, no regressionsXPathQueryTest,XQueryFunctionsTestunit tests — no new failures (currentDateTimefailure is pre-existing ondevelop/next-v3baseline, unrelated to this change)