Skip to content

[bugfix] Align fn: function signatures with W3C XQuery 4.0 PR197#6287

Closed
joewiz wants to merge 369 commits into
eXist-db:developfrom
joewiz:phase2/fn-signature-audit
Closed

[bugfix] Align fn: function signatures with W3C XQuery 4.0 PR197#6287
joewiz wants to merge 369 commits into
eXist-db:developfrom
joewiz:phase2/fn-signature-audit

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 1, 2026

Summary

Audit and fix FunctionSignature definitions across org.exist.xquery.functions.fn and .array against the W3C XQuery 4.0 spec (PR197). The QT4 keyword test set (misc-BuiltInKeywords) checks that fn:foo(arg := ?) instance of function(...) as ... matches the function-item signature W3C declares; eXist's signatures previously diverged from the spec in parameter cardinalities, parameter types, and return-type details, causing 49 partial-application instance-of checks to fail.

XQTS impact

Test set Before After Δ
misc-BuiltInKeywords 76 fails 31 fails −45
misc-Subtyping 26 25 −1
fn-collation-key 10 7 −3
fn-string-join 2 0 −2
fn-resolve-uri 3 1 −2
fn-substring 1 0 −1
fn-substring-before 3 1 −2
fn-substring-after 2 0 −2

misc-BuiltInKeywords pass rate: 72.9% → 89.5% (no regressions across audited sets).

What changed

Signature changes (per W3C PR197)

  • collation parameters widened from xs:string (exactly-one) to xs:string?: fn:contains, fn:ends-with, fn:starts-with, fn:compare, fn:distinct-values, fn:index-of, fn:substring-before, fn:substring-after, fn:collation-key, fn:string-join's separator, array:index-of.
  • options map parameters widened to map(*)?: fn:doc, fn:doc-available, fn:csv-to-arrays, fn:parse-csv, fn:csv-to-xml, fn:csv-doc, fn:parse-xml, fn:parse-xml-fragment, fn:path.
  • trailing optional length / action widened to ?: fn:substring length, fn:subsequence length (also start/length type now xs:numeric), array:subarray length, array:build action.
  • fn:resolve-uri base param is xs:string?.
  • fn:seconds value is xs:decimal?.
  • fn:unix-dateTime value is xs:nonNegativeInteger?, return type is xs:dateTimeStamp.
  • typed function/array/record/map returns declared via setFunctionParamTypes / setRecordType:
    • fn:opfn(item()*, item()*) as item()*
    • fn:invisible-xmlfn(xs:string) as item()
    • fn:analyze-stringelement(fn:analyze-string-result)
    • fn:element-to-mapmap(xs:string, item()?)?
    • fn:function-annotationsmap(xs:QName, xs:anyAtomicType*)*
    • fn:csv-to-arraysarray(xs:string)*
    • fn:divide-decimalsrecord(quotient as xs:decimal, remainder as xs:decimal)
    • fn:in-scope-namespacesmap(xs:NCName, xs:anyURI)
    • fn:transitive-closurenode()*
    • array:membersrecord(value as item()*)*
    • fn:transformmap(*)
    • fn:collation accepts map(*) and returns xs:string

Type-checker support (SequenceType.isSubtypeOf)

  • Choice (union) type on the sub side: every alternative must be a subtype of sup. Unblocks date/time accessors (fn:year-from-dateTime etc.) and fn:char whose spec parameter types are unions but eXist uses a broader primary type.
  • Bare map(*) is treated as map(xs:anyAtomicType, item()*) in subtype checks (was map(xs:string, item()*), which contradicts XQuery 4.0). Records flow through with an xs:string key fallback because record keys are always strings — this preserves the existing subtyping-110 (record(a) ⊑ map(xs:string, item()*)) test.
  • Bare array(*) is now treated as array(item()*); an untyped array is a subtype of array(item()*).

Implementation adjustments accompanying the signature changes

  • CollatingFunction.getCollator treats an empty xs:string? collation argument as a request for the default collator.
  • FunSubstring, FunSubSequence, FunAnalyzeString, ArrayFunction.subArray handle an empty optional-length argument as the no-length form.
  • ArrayBuild treats an empty action argument as identity (each input item becomes one array member).
  • FunResolveURI falls back to the static base URI when the explicit base argument is empty.

Spec references

Out of scope (remaining misc-BuiltInKeywords failures)

The remaining 31 failures fall into categories that need substantive feature work and are tracked separately:

  • 13 XPST0017 arity mismatches — net-new XQ4 functions (fn:jtree, fn:jvalue, fn:jposition, fn:jkey, fn:system-properties, fn:build-dateTime, fn:xsd-validator, the 3-arg fn:round, fn:unparsed-text* overloads, fn:element-with-id, map:build).
  • 11 XPST0051 unknown record types — XQ4 record-typed return values (fn:dateTime-record, fn:parsed-csv-structure-record, fn:element-conversion-plan-record, etc.) need parser/registry support.
  • 4 XPST0003 parse errors — XQ4 syntax (document-node(prefix:*), record(...) in instance of) not yet supported in the grammar.
  • 1 fn:matches internal error — pre-existing partial-application bug in placeholder expansion.

Test plan

  • misc-BuiltInKeywords XQTS — 76 → 31 fails
  • misc-Subtyping XQTS — 26 → 25 fails
  • fn-collation-key, fn-string-join, fn-resolve-uri, fn-substring, fn-substring-before, fn-substring-after — all improved, no regressions
  • XPathQueryTest, XQueryFunctionsTest unit tests — no new failures (currentDateTime failure is pre-existing on develop/next-v3 baseline, unrelated to this change)
  • Codacy PMD — only pre-existing or stylistic warnings (field declaration order); no NPath increases on touched methods

joewiz and others added 30 commits April 15, 2026 09:19
Add parse="text" handling per XInclude spec section 4.2. When an
xi:include element has parse="text", the referenced resource is read
as plain text and included as character data (with XML special
characters preserved for escaping during serialization).

Four resource types are handled:
- Binary documents in database: read raw bytes with specified encoding
- XML documents in database: serialize to XML text (omit declaration)
- In-memory documents: serialize to XML text
- External URIs: read from URL connection

Also reads the encoding attribute for charset selection (defaults to
UTF-8), with graceful fallback for unsupported charsets.

Architectural note: BaseX delegates XInclude entirely to Java's built-in
SAXParserFactory.setXIncludeAware(true) at parse time — zero custom code.
eXist's serialization-time approach is more powerful (works on stored
documents) but requires explicit feature implementation. A complementary
parse-time option could be added as a future enhancement.

Test results: 47/149 pass (31.5%), up from 42/149 (28.2%). +5 from
parse="text" support. Existing XIncludeSerializerTest: 7/7 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 4 XInclude tests from the XProc 3.0 test suite covering scenarios not in W3C suites: parse="text" on XML documents, fixup-xml-lang parameter. Add parse attribute validation to XIncludeFilter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved grammar conflict: keep both XQUF and XQFT keyword blocks
in the reservedKeywords rule (removed duplicate 'least' from XQFT).
Resolved 42 conflicts across 15 files:
- Grammars (XQuery.g, XQueryTree.g): used next-v2 versions which have
  all features correctly merged (XQUF + XQFT + XQ4 + expression chain)
- CastExpression, CastableExpression, Type, SequenceType,
  UserDefinedFunction, DynamicTypeCheck: took theirs (XQ4 type system)
- Constants: took theirs (complete XQ4 axis numbering)
- DoubleValue, AbstractDateTimeValue: took ours (compliance error codes)
- LocationStep, FilterExprAM: took ours (already has v2/xq4-axes/filter-expr-am)
- StaticXQueryException: took ours (additional constructors from compliance fixes)
- FunctionFactory: removed duplicate variable declaration
Grammar conflicts resolved: kept ours (already has decimal-format from
next-v2 grammar base).
Resolved 16 conflicts:
- Grammar: took ours (already has all features from next-v2)
- Saxon 12 regex files (FunAnalyzeString, FunMatches, FunReplace, RegexUtil):
  took ours (Saxon 12 API)
- XQ4 function files: took theirs (new XQ4 function implementations)
- pom.xml: took theirs (includes markup-blitz dependency)
joewiz and others added 22 commits April 29, 2026 23:34
…espaces, namespace-prefixes (attrs), timezones, map-order

XQ4 PR320/PR1855 introduced an options-map third argument to
fn:deep-equal. The DeepEqualOptions parser already recognized the
option keys, but several were stored as flags without ever being
consulted by the comparison engine. This commit wires them up:

- base-uri: compare element base URIs (xml:base / inherited)
- in-scope-namespaces: compare in-scope NS bindings as sets,
  walking ancestor xmlns declarations
- namespace-prefixes: extend prefix check to attributes (was only
  applied to elements). Use a fallback that parses the prefix from
  nodeName when getPrefix() returns null/empty
- timezones: when set, two date/time atomics with different
  explicit timezones (or one missing) are not deep-equal even if
  they represent the same instant
- map-order: iterate both maps in their recorded insertion order
  (PR1703) and compare keys position-wise

Two collateral fixes:

- items-equal: drop the eager arity check at parse time. The spec
  permits any function-typed value (the test deep-equal-40-items-equal-004
  passes true#0 because length-mismatched sequences must return false
  before invoking the callback). Arity is now validated lazily.
- Unsupported collation in the 3-arg variant now raises FOCH0002
  (the runtime error) rather than letting XQST0076 leak through
  from getCollator. Spec accepts FOCH0002 here.

Also fix deep-equal-options-test.xq: a library module cannot
declare local:helper (XQST0048) — renamed to det:helper.

XQTS QT4 fn-deep-equal: 33 -> 26 F+E (9 tests fixed: base-uri-003,
in-scope-namespaces-003, timezones-003, items-equal-{004,005,006,008},
normalize-unicode-004, map-order-003; 2 regressions in
whitespace-{009,031} that need follow-up).

JUnit DeepEqualTest: 63/63 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…odes

* TryCatchExpression: bind XQuery 4.0 $err:map (PR493) and $err:stack-trace
  (PR1470/PR1599) inside catch clauses. err:map is a map(xs:string, item()*)
  containing all standard error properties (code, description, value, module,
  line-number, column-number, additional, stack-trace).
* UntypedValueCheck: raise XPTY0117 when implicitly coercing xs:untypedAtomic
  to a namespace-sensitive target type (xs:QName, xs:NOTATION) during
  function-call argument coercion (XPath F&O 3.1 §19.1).
* CastExpression: allow xs:untypedAtomic as a source for cast as xs:QName
  (XQ30+ relaxed the rule — lexical errors raise FORG0001 via QNameValue).
* IntegerValue: revert XQ4 hex/binary/underscore parsing for runtime
  string-to-integer casts. Those literal extensions belong only to the parser
  token path (XQueryTree.g uses the BigInteger constructor) — applying them
  to xs:integer("0x0") accepted invalid lexical forms.

Reduces QT4 prod-TryCatchExpr failures from 34 to ~28 (covers $err:map and
$err:stack-trace tests). Net improvement on prod-CastExpr from XPTY0117
alignment and untypedAtomic→QName cast.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move DOCTYPE emission rules into XHTMLWriter so both XHTML4 and HTML4
share the same logic; consolidate the previously diverging XHTML5Writer
override.

Per W3C XSLT and XQuery Serialization 3.1 sections 7.1 and 7.2:
- doctype-system set: emit DOCTYPE PUBLIC/SYSTEM
- doctype-system absent, html method, doctype-public set: emit DOCTYPE PUBLIC
- doctype-system absent, html-version >= 5: emit <!DOCTYPE html>
- otherwise: no DOCTYPE

Previously XHTMLWriter inherited XMLWriter's writeDoctype which emitted
a DOCTYPE whenever either id was set, causing xhtml-25 to emit a stray
DOCTYPE PUBLIC. XHTML5Writer's override suppressed <!DOCTYPE html> when
doctype-public was set without doctype-system, which broke xhtml-27.

isHtmlMethod and isHtml5Version are now protected (not private), and
isHtml5Version reads html-version first, falling back to version per
the W3C spec note for html method.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Phase 2 conformance gaps in two related date/time function sets.

fn:parse-ietf-date (76.2% -> 100%, 105/105):
- Default missing timezone to UTC (per spec note: no timezone == GMT).
- Time parser no longer greedily consumes year digits as a timezone in
  asctime form (e.g. "Aug 20 19:36 2014").
- Timezone offset parser handles 1, 2, 3, 4-digit forms ("-5", "-05",
  "-500", "-0500"), trailing colons ("-05:"), and ":mm" with no minutes
  ("Feb-02 02:02-02: 02"); no longer eats trailing whitespace+content.
- Recognise lowercase TZ names ("gmt", "utc"); recognise lowercase day
  and month names already worked.
- Optional "(TZNAME)" comment after offsets is parsed and validated;
  empty parens / unknown TZ in parens raise FORG0010.
- Day-name without trailing whitespace ("Wed,20 Aug ...") now errors.
- dsep requires at least whitespace or hyphen; "20Aug" / "Aug2014" now
  error as expected.
- Handle 24:00 without seconds as midnight at end of day.
- "." with no fractional digits now errors (errs27).

fn:build-dateTime (64.8% -> 95.8%, 68/71):
- Strict field combination validation: empty record, time without all of
  hours/minutes/seconds, year+day without month, time fields with
  incomplete date components -- all raise FODT0005 with clear messages.
- Numeric coercion: integer fields accept xs:integer or xs:decimal that
  is exactly an integer; xs:double, fractional decimals, NaN/Infinity
  raise XPTY0004. Untyped/node values are parsed as integers (test
  date-without-timezone-from-nodes).
- Seconds field accepts xs:integer / xs:decimal / finite xs:double or
  xs:float; rejects NaN/Infinity (XPTY0004).
- Timezone accepts xs:duration too (rejecting year/month parts);
  validates +-14:00 range and whole-minute offsets (FODT0006).
- Calendar-day validity (28/29/30/31, leap years) checked up-front and
  reported as FODT0006 instead of bubbling up FORG0001 from the lexical
  parser. Seconds range 0..<60 also FODT0006.
- Year formatting handles year 0 ("0000") and negative years ("-0001")
  per XSD 1.1 representation.
- When a timezone is supplied with a full dateTime, return
  xs:dateTimeStamp instead of xs:dateTime.

XQTS QT4 deltas (antlr parser):
| Set                | Before               | After                |
| ------------------ | -------------------- | -------------------- |
| fn-parse-ietf-date | 80/105 (76.2%)       | 105/105 (100%)       |
| fn-build-dateTime  | 46/71  (64.8%)       | 68/71  (95.8%)       |

Remaining build-dateTime failures: year-zero formatting (XSD 1.0
javax.xml.datatype rejection) and one XPST0017 case that depends on
removing the xq31 2-arg overload.
XQueryContext.checkOptions resolved namespace prefixes only from
inScopeNamespaces, which contains element-constructor scoped namespaces
but NOT prologue declarations like `declare namespace p = "..."`.
Prologue namespaces are stored in staticNamespaces; getURIForPrefix is
the canonical accessor that consults inScope, inherited, and static
maps in turn.

This caused prefixed names in serialization options (e.g.
`declare option output:cdata-section-elements "p:b"`) to resolve their
prefix to a null URI, producing the QName "{null}b" which never matched
real elements during serialization.

Fixes XQTS QT4: method-xhtml -18, -19a, -19b, -19c (cdata-section-elements
on prefixed elements) and method-xml K2-Serialization-30. method-xhtml
now at 81.1% (43/53), method-xml at 80.9% (38/47), both above Phase 2 80%
gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related JSON serialization fixes that bring eXist into line with
W3C XSLT and XQuery Serialization 3.1 § 10:

1. Single-element/document sequences with method=json no longer route
   through the legacy XML-to-JSON conversion writer (which renders
   `<e/>` as the JSON literal `null`). They now go through the
   W3C-compliant JSONSerializer, which serializes the node as XML and
   wraps the result in a JSON string. Callers that depended on the
   legacy XML-to-JSON object graph can opt back in via the new
   `exist:legacy-json-conversion` parameter (default "no").

2. Default escape-solidus is now "yes". The XQ 3.1 spec did not define
   the parameter, but mainstream serializers (and the XQTS tests
   authored against XQ 3.1) assume `\/` escaping. PR534 adds the
   parameter for XQ 4.0 with default "no"; until we plumb XQuery
   version awareness, defaulting to "yes" preserves the legacy XQTS
   tests and lets XQ 4.0 callers explicitly opt out via
   `output:escape-solidus "no"`. Tests json-75 and json-76 already
   set the value explicitly.

Fixes XQTS QT4: method-json -27, -29, -51, -56, -61, -64..-67, -69..-72
(13 new passes). method-json now at 84.4% (12/77 fail), passing the
Phase 2 80%/30-fail gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier commit 5c407e5 routed every single-element/document method=json
sequence through the W3C JSONSerializer. That was correct for QT4 XQTS
serialization tests but broke eXist tests that exercise the legacy
XML-to-JSON conversion (json:literal / json:array / json:value / json:name
elements and attributes).

Refine the routing:
* If the input root carries the `json:` prefix on any element or
  attribute (the legacy convention's marker), route through the legacy
  XML-to-JSON writer.
* Otherwise route through the W3C-compliant JSONSerializer.
* `legacy-json-conversion="yes"|"no"` (the eXist-specific parameter,
  registered with W3CParameterConvention as a boolean string so it is
  settable from `serialize` parameter maps and XML
  serialization-parameters elements alike) explicitly overrides the
  heuristic.

Update the legacy-only tests in `exist-core/src/test/xquery/json.xml`
and `EvalTest#evalAndSerializeJson` to opt in via the new parameter:
those tests build vanilla XML without the json: prefix but explicitly
expect the legacy XML-to-JSON object graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two PI serialization rules from W3C XSLT and XQuery Serialization:

* HTML method (pre-HTML5, version < 5.0): processing instructions are
  serialized as `<?target data>` with no closing `?>` — § 7.1.5 of the
  XSLT and XQuery Serialization 3.1 spec. Previously XHTMLWriter
  inherited XML's `<?target data?>` form regardless of method.
* HTML5 method (version 5.0): per QT4 PR2372, since HTML5 has no PI
  syntax, the serializer renders processing instructions as comments
  of the form `<!--?target data?-->`, matching the HTML5 parser's
  coercion of `<?...?>` content. Previously HTML5Writer emitted the
  pre-HTML5 form.

Fixes XQTS QT4: method-html -48, -58, -59a (3 new passes). The XQ
3.0/3.1-only -59 case now regresses because the XQTS runner prepends
`xquery version "4.0"` to every test and the new HTML5 PI form is
the XQ 4.0 normative output; the older `<?pi data>` form survives only
under XQ 3.x. Net for method-html: 24 → 22 fails (65.2% → 68.1%).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename local: function declarations to use the module's target namespace
prefix, as required by XQuery 3.1 §4.18 (XQST0048). The XQST0048 check
in ExternalModuleImpl.declareFunction() correctly rejects these.

- xquery4/fnXQuery40.xql: local:add → t:add, local:greet → t:greet,
  local:choice-param → t:choice-param
- xquery3/fnXQuery40.xql: local:annotated-fn → t:annotated-fn
- unzip-tests.xql: local:entry-data → uz:entry-data,
  local:entry-filter → uz:entry-filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rder

The XQuery 4.0 set operators (union, except, intersect) rejected operands
of type json-node() because their type checks used Type.subTypeOf(NODE)
rather than Type.isNodeType(). JSON_NODE is its own subtree under ITEM
in eXist's type lattice, not a NODE subtype. Switch the checks to
Type.isNodeType so JNode operands are accepted.

JNodes also lacked equals/hashCode and a document-order comparator, so
$in/a union $in/a deduplicated to two nodes (distinct Java instances)
and ($in/d, $in/a) union ($in/c, $in/d) preserved source order rather
than document order. Add path-based equals/hashCode (root identity for
roots; parent + key + position for descendants) and a static
compareDocumentOrder method that walks to the lowest common ancestor.

Track member iteration order in the position field so document-order
sort is meaningful for object members; FnJNode.evalJposition still
returns 1 for object members per the W3C XQ4 fn:jposition spec
(member identity within an object is the key, not position).

ValueSequence.removeDuplicateNodes recognized only NODE subtypes; widen
to also dedup JSON_NODE items via a HashMap (since ItemComparator's
TreeMap path can't order JNodes). Union and Except invoke an explicit
JNode-aware sort after dedup when the result contains JSON nodes,
because the existing InMemoryNodeComparator (FastQSort) handles only
XML memtree nodes.

QT4 XQTS impact (antlr parser):
  op-union     19 fails -> 1 fail (97.8% -> 98.9%)
  op-except    17 fails -> 1 fail (97.6% -> 98.8%)
  op-intersect 17 fails -> 2 fails

Closes Phase 2.5 set-ops portion of the post-90 fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…JS0001 trailing tokens

Several QT4 fn:json-to-xml conformance gaps:

1. The duplicates default for json-to-xml was use-first; per the W3C
   spec (and QT4 test json-to-xml-018) it must be retain — XML output
   can represent duplicates as repeated child elements where a map
   could not. parse-json/json-doc continue to default to use-first.

2. The validation for the duplicates option rejected use-last and
   retain for json-to-xml. In XQuery 4.0 all four values (reject,
   use-first, use-last, retain) are valid for json-to-xml.

3. Jackson silently stops at end-of-first-value, so json-to-xml('{}extra')
   succeeded instead of raising FOJS0001. Check parser.nextToken()
   after the top-level value and raise FOJS0001 on extra content.

4. The number-parser option allows any item that atomizes to xs:numeric
   or xs:string. Previously a function item (like fn { fn() { $n } })
   was treated as an empty result; now it raises FOTY0013. Also accept
   nodes (atomized via getStringValue) and 0-arity functions (called
   with no arguments).

QT4 XQTS impact:
  fn-json-to-xml  15 fails -> 2 fails (98.2% pass)

Closes Phase 2.5 fn-json-to-xml portion of the post-90 fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When XQueryTreeParser created its staticContext via the XQueryContext
copy constructor, only staticNamespaces were copied. Host-supplied
namespaces declared by an embedding caller via declareInScopeNamespace
(e.g., the XQTS runner forwarding a test-set environment's <namespace
prefix="..."/> declarations) were lost, causing XPST0081 at parse time
for tests that referenced those prefixes (e.g. atomic:root, j:string).

Carry forward inScopeNamespaces in the copy constructor so they remain
visible to static analysis on the copied context.

QT4 XQTS impact (with the runner-side fix that propagates testCase env
namespaces to assertion queries):
  op-union          (atomic: prefix tests now resolve)
  op-except         (atomic: prefix tests now resolve)
  fn-json-to-xml    (j: prefix in instance-of assertions now resolves)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 7758007 accidentally removed the compareAttributes(a, b)
call from compareElements when adding the new base-uri and
in-scope-namespaces option blocks. As a result, two elements with
different attribute values were treated as deep-equal whenever
their names, child contents, and (default-off) base-uri/in-scope
options matched.

Restore the call after the option checks. With this fix:
- xquery3.deep-equal-options-test: whitespace-strip-attr and
  whitespace-normalize-attr-different (assertFalse on differing
  attribute values) pass again
- XQTS QT4 fn-deep-equal: 33 -> 26 F+E, no regressions vs baseline,
  with namespace-prefixes-004 now also passing (the attribute
  prefix check from the parent commit was unreachable until now)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When include-content-type=yes (the default), the serializer auto-emits a
Content-Type / charset meta tag as the first child of <head>. If the
input also contains an explicit `<meta charset>` or `<meta http-equiv="Content-Type">`,
we ended up writing two metas in the output, which fails the XQTS regex
checks of the form `not(meta.*meta)` and breaks W3C HTML/XHTML
serialization compliance (PR2372).

The fix diverts each candidate meta inside <head> to a scratch buffer at
startElement time. attribute() inspects the captured attributes; if any
of them is `charset` or `http-equiv="Content-Type"` (case-insensitive),
the buffered meta is dropped at endElement time so the auto-emitted meta
stands as the single Content-Type / charset element. Otherwise the
buffer is flushed verbatim, preserving regular meta elements like
`<meta name="description">`.

HTML5Writer uses its own attribute() and short-circuits endElement() for
void elements, so the dedup hooks (`noteMetaAttribute`, `endMetaBuffer`)
are exposed as protected and called from HTML5Writer to keep the HTML5
output method on the same code path as HTML4/XHTML.

Fixes XQTS QT4: method-html -34, -37a, -60 (3 new passes); method-xhtml
-34, -37, -37a, -68 (4 new passes). method-xhtml now at 88.7%
(6/53 fail). method-html at 72.5% (19/69 fail).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings method-html from 22F (68.1%) past the Phase 2 gate (≥80% AND ≤30F)
to 10F (81.2%) on the QT4 serialization test set, with no regressions in
method-json/xhtml/xml or in unit tests touching the HTML serializers.

Five W3C XSLT/XQuery Serialization 3.1 § 7 conformance fixes:

- HTML5Writer.attribute(): case-insensitive boolean attribute minimization
  per § 7.2.2 — `<option selected="SELECTED">` now serializes as
  `<option selected>` (Serialization-html-13). The matcher accepts
  empty values too.

- XHTMLWriter / HTML5Writer attribute(): apply escape-uri-attributes
  (default `yes`) per § 7.2.5 to URI-valued attributes (a/@href,
  img/@src, link/@href, etc.). Only non-ASCII codepoints are %-encoded
  to UTF-8 — ASCII (incl. literal space) passes through to avoid
  double-encoding existing escape sequences. (Serialization-html-43, -44)

- XHTMLWriter.shouldUseCdataSections(): for the html method, cdata-section-
  elements is ignored for HTML-namespaced elements but DOES apply to
  foreign content (§ 7.2.7). Foreign-namespaced elements bypass the
  xdm-serialization gate. (Serialization-html-18)

- HTML5Writer.closeStartTag(): foreign content embedded in HTML5 is
  self-closed with `/>` instead of the `></tag>` expanded form, so
  consumers can re-parse the foreign block as XML.
  (Serialization-html-6)

- HTML5Writer.namespace(): XHTML namespace declarations are still
  suppressed (HTML5 parser puts elements in the HTML namespace
  implicitly), but foreign-content namespace declarations are now
  emitted so SVG/MathML/custom-XML round-trip. (Serialization-html-19a-c)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nCall

Bring QT4 prod-DynamicFunctionCall from 49.3% to 81.6% (40 -> 14 F+E)
by implementing function-coercion semantics for record-typed parameters
(XQ4 PR1132/PR1501) and adding W3C Schema list-type cast support.

CastExpression: handle xs:NMTOKENS, xs:IDREFS, xs:ENTITIES by splitting
the source string on whitespace and producing a sequence of items typed
as the corresponding atomic item type. Previously a cast to any list
type fell through to StringValue.convertTo's default branch and threw
XPTY0004.

DynamicTypeCheck: factor the per-item function-coercion logic into a
public static helper coerceAtomicItem so other code paths can reuse it
without going through an Expression wrapper.

UserDefinedFunction: before validating a record-typed parameter, walk
the map's declared fields and apply function coercion to each value:
atomize node/array values, cast untypedAtomic to the declared type,
apply numeric promotion and XQ4 implicit casting/relabeling, and try
each alternative in choice (union) field types in declaration order.
Nested record types recurse. The coerced map is then bound to the
parameter so the function body sees the typed values.

SequenceType.checkType(Sequence): also iterate items for record types
and structurally-typed maps/arrays. The previous primary-type subtype
shortcut was unsound for these (a value of type map(*) -- a parent of
RECORD in the type hierarchy -- would erroneously satisfy a record
type).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per XQuery 3.1/4.0 spec, all xmlns declarations on a direct element
constructor are statically in scope for the entire element, including
attribute value expressions. The tree walker previously declared
namespaces only as it encountered each xmlns attribute, so attribute
value expressions referencing prefixes declared LATER on the same
element would fail with XPST0081.

Add a pre-pass that scans all attribute children of an ELEMENT node
for literal xmlns/xmlns:prefix declarations and declares them on the
static context before walking attribute value expressions.

Lifts prod-DirElemContent.namespace from 75.1% to 83.5%
(33 -> 22 failures) on QT4 XQTS. Fixes static resolution of QNames
like p:integer in <e a="{1 instance of p:integer}" xmlns:p="..."/>
where xmlns appears after the attribute that references it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce QT4 fn-format-number failures from 37 to 11 by addressing edge
cases in picture analysis, map-form options, and exponent formatting:

* Irregular grouping: per XPath 4.0, when integer-part separator
  positions are irregular, place separators only at the explicit
  positions instead of extrapolating the rightmost group size. Track
  the integer-part digit count to bound the regularity check at
  digit-count - 1, which makes pictures like '####,##' irregular while
  keeping '##,##,##' regular.

* Picture analysis after the exponent: passive characters (letters,
  punctuation) are now allowed as suffix material, while active
  characters (% and per-mille) raise FODF1310. A second occurrence of
  the exponent-separator that would itself be active (preceded and
  followed by digits) is rejected. Pictures like '9.9999e99end' work;
  '9.9999e999%' errors.

* Adjusted variables: when minimum-fractional-part-size is forced to 1
  by the integer-zero/maximum-frac-zero/exponent rule, mark the
  sub-picture as having a decimal separator so step 12 does not strip
  it from output ('#e0' formats 0.2 as '0.2e0').

* Exponent rendering: convert ASCII exponent digits to the
  decimal-digit family using zero-digit; support multi-character
  minus-sign on negative exponents.

* Multi-character minus-sign: change DecimalFormat.minusSign from int
  to String so renditions like 'minus ' work in negative-prefix and
  exponent-sign output. Update both ANTLR and recursive-descent parser
  to pass the value through unchanged for declare decimal-format.

* Map-form options:
  - Accept xs:QName (or string) for format-name, including via the
    bare-string overload, so namespaced decimal formats resolve via
    the QName item directly.
  - Atomize property values (sequences and singleton arrays) before
    applying them.
  - Reject multi-codepoint values for single-character properties
    (FODF1290), unless the value uses the X:rendition pattern.
  - After applying overrides, validate distinctness of all eight
    picture-string properties, that zero-digit is a Unicode digit
    with numeric value zero, and that digit is not a member of the
    decimal digit family. Each violation raises FODF1290.

* New error code: ErrorCodes.FODF1290 (Invalid decimal format
  property value).

Closes the fn-format-number portion of post-90-fixes phase 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce QT4 prod-WindowClause failures from 35 to 11 and prod-LetClause
from 37 to 27 by completing the parser/runtime support that the tree
walker side already had:

* Grammar: make WindowStartCondition and WindowEndCondition individually
  optional in windowClause, and make the "when ExprSingle" guard
  optional inside each (XQ4 PR483). Sliding windows still require an
  end clause; that constraint is enforced in the tree walker.

* WindowCondition / WindowExpr: tolerate a null whenExpression on either
  the start or the end condition. A missing "when" defaults to true()
  during analyze() and eval(), and toString() omits the "when ..."
  fragment entirely so dump output stays readable.

* LetExpr: when the variable has an explicit atomic SequenceType and
  XQuery version >= 4.0, run a function-conversion pass over the bound
  value before the body executes (XQ4 PR1131). New
  coerceAtomicSequence() casts each item to the declared type via
  atomize().convertTo(), promoting xs:integer/decimal/float to
  xs:double, casting xs:untypedAtomic and xs:anyURI to the target
  atomic type, and falling back to the existing XPTY0004 path if any
  item cannot be converted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses misc-Subtyping XQTS gaps. misc-Subtyping QT4 fail count drops
from 45 to 26 (153 tests, 41 skipped, 112 active; pass rate 75%).

Parser changes (XQuery.g, XQueryTree.g):
- recordFieldDecl: suppress the optional QUESTION token from the AST so
  the tree walker no longer sees a stray '?' node after a record field
  with no type clause.
- Tree walker: allow xs:error in the atomic-type position. xs:error is
  defined as a builtin under ANY_SIMPLE_TYPE; per XQuery 4.0 it is a
  legitimate sequence type (its value space is empty, so xs:error*
  matches only the empty sequence).
- documentTest: accept document-node(*) as XQuery 4.0 short form for
  document-node(element(*)).

SequenceType subtype rules (SequenceType.java#isSubtypeOf):
- Element/attribute kind tests now compare nodeName: when sup names a
  specific element/attribute, sub must name the same one. Previously
  element(*) was reported as a subtype of element(a).
- Records now subtype-check structurally on declared fields per XQuery
  4.0 Records: required fields of sup must exist (and be required) in
  sub with sub's field type subtype-of sup's, and sub may not declare
  extra fields unless sup is extensible.
- map(K, V) subtype check now allows records (RECORD <= MAP_ITEM) to
  flow through, treating an untyped record as map(xs:string, item()*).
- function-shape conversion for maps and arrays now uses the declared
  K/V types: map(K, V) acts as function(xs:anyAtomicType) as V?
  (atomic-coerced key per XQ4 PR1501; lookup miss widens cardinality);
  array(T) acts as function(xs:integer) as T. Records flow through the
  map branch via Type.subTypeOf(sub.primaryType, MAP_ITEM).

XQTS misc-Subtyping (QT4):
- Before: 45 fails, 41 skipped, 67 passes (59.8%)
- After:  26 fails, 41 skipped, 86 passes (76.8%)

Remaining failures are XQuery 4.0 features outside this change set:
- element/attribute multi-name tests (element(a|b)) and namespace
  wildcards (element(p1:*), element(*:a)) -- 14 tests
- element-type tracking (element(a, xs:integer) covariance) -- 4 tests
- gnode() as supertype of node()|jnode() -- 3 tests
- document-node(QName) bare short form, document-node(*) vs () -- 2
- union-type widening (xs:long|xs:int subtype of xs:integer) -- 3
… 4.0 spec

W3C XQuery 4.0 PR197 specifies the keyword-argument names for built-in
functions. eXist's signatures used the legacy 3.1-era names:

  fn:json-doc:    eXist used $href      -> spec says source
  fn:parse-json:  eXist used $json-text -> spec says value
  fn:json-to-xml: eXist used $json-text -> spec says value

These names are visible to callers via XQ4 keyword-argument syntax
(name := value), so they must match the spec for keyword calls to
resolve. Positional-call behavior is unchanged.

Confirmed via QT4 misc-BuiltInKeywords XQTS:
  Keywords-fn-json-doc-1   pass (was XPST0017)
  Keywords-fn-parse-json-1 pass (was XPST0017)
  Keywords-fn-json-to-xml-1 still fails on a separate parser-level
                            issue in its instance-of clause
                            (document-node(fn:*) wildcard).

Net misc-BuiltInKeywords: 83 -> 79 F+E (72.0% -> 72.8%). The remaining
79 failures need a per-function W3C-4.0 signature audit and new XQ4
type-syntax parser features (record types, local union types, document-
node element wildcards), tracked under separate phase2-* taskings.
…R197

Audit and fix FunctionSignature definitions against the QT4CG XQuery 4.0
spec so partial-application instance-of tests succeed. The QT4 keyword
test catalog (misc-BuiltInKeywords) checks that
fn:foo(arg := ?) instance of function(...) as ... matches the
signature declared by W3C; before this commit eXist's signatures
diverged in parameter names, cardinalities, and return types.

Common patterns fixed:
- collation parameter widened from xs:string to xs:string?
  (fn:contains, fn:ends-with, fn:starts-with, fn:compare,
  fn:distinct-values, fn:index-of, fn:substring-before,
  fn:substring-after, fn:collation-key, fn:string-join's separator,
  array:index-of)
- options map parameter widened to map(*)?
  (fn:doc, fn:doc-available, fn:csv-to-arrays, fn:parse-csv,
  fn:csv-to-xml, fn:csv-doc, fn:parse-xml, fn:parse-xml-fragment,
  fn:path)
- length-style optional trailing parameter widened to ?
  (fn:substring's length, fn:subsequence's length,
  array:subarray's length, array:build's action)
- start type widened to xs:numeric for fn:subsequence
- base param of fn:resolve-uri is xs:string?
- fn:seconds value is xs:decimal?
- fn:unix-dateTime value is xs:nonNegativeInteger?, returns dateTimeStamp
- typed function/array/record/map returns:
  fn:op returns fn(item()*, item()*) as item()*
  fn:invisible-xml returns fn(xs:string) as item()
  fn:analyze-string returns element(fn:analyze-string-result)
  fn:element-to-map returns map(xs:string, item()?)?
  fn:function-annotations returns map(xs:QName, xs:anyAtomicType*)*
  fn:csv-to-arrays returns array(xs:string)*
  fn:divide-decimals returns record(quotient/remainder)
  fn:in-scope-namespaces returns map(xs:NCName, xs:anyURI)
  fn:transitive-closure returns node()*
  array:members returns record(value as item()*)*
  fn:transform returns map(*)
  fn:collation accepts map(*) and returns xs:string

Type-checker support added:
- SequenceType.isSubtypeOf now handles a choice (union) type on the
  sub side: every alternative must be a subtype of the supertype.
  This unblocks date/time accessors (fn:year-from-dateTime etc.) and
  fn:char where the spec types are union types but eXist uses a
  single broader primary type.
- Bare map(*) is treated as map(xs:anyAtomicType, item()*) in subtype
  checks (was xs:string, which contradicts the W3C spec); records flow
  through with an xs:string key fallback because record keys are
  always strings.
- Bare array(*) is now treated as array(item()*); array(*) is a
  subtype of array(item()*).

Implementation tweaks accompanying the signature changes:
- CollatingFunction.getCollator handles an empty xs:string? collation
  argument by returning the default collator.
- FunSubstring, FunSubSequence, FunAnalyzeString, ArrayFunction handle
  an empty optional-length argument by behaving as the no-length form.
- ArrayBuild handles an empty action argument by returning the input
  unchanged (identity-like).
- FunResolveURI handles an empty base-URI argument by falling back to
  the static base URI.

XQTS misc-BuiltInKeywords pass rate: 72.9% -> 89.5% (76 -> 31 fails).
Remaining failures are net-new XQ4 functions (j-tree, j-key, j-value,
j-position, system-properties, build-dateTime, etc.), unrecognised
record types (fn:dateTime-record, fn:parsed-csv-structure-record,
etc.), parser support for record/document-node wildcards in instance-of
expressions, and a fn:matches partial-application internal error —
all out of scope for this signature audit.

Other affected XQTS sets (no regressions, only improvements):
fn-collation-key 10 -> 7, fn-string-join 2 -> 0, fn-resolve-uri 3 -> 1,
fn-substring 1 -> 0, fn-substring-before 3 -> 1, fn-substring-after
2 -> 0, misc-Subtyping 26 -> 25.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner May 1, 2026 00:54
@joewiz joewiz closed this May 1, 2026
@joewiz joewiz deleted the phase2/fn-signature-audit branch May 1, 2026 01:16
@joewiz joewiz restored the phase2/fn-signature-audit branch May 1, 2026 01:49
@joewiz joewiz reopened this May 1, 2026
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 1, 2026

Opened in error — this PR's branch was forked from the next-v3 integration branch (360+ commits ahead of develop), making it appear as 389 commits. The one substantive commit (54501b185b — fn: signature audit) has been cherry-picked to v2/xq4-core-functions and is included in PR #6218. The PR description has been preserved in #6218.

@joewiz joewiz closed this May 1, 2026
@joewiz joewiz deleted the phase2/fn-signature-audit branch May 1, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant