Skip to content

[bugfix] fn:xml-to-json: enforce F&O 3.1 §17.4.2 structural validation (+10 XQTS HEAD)#6350

Open
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:bugfix/fn-xml-to-json-over-permissive-validation
Open

[bugfix] fn:xml-to-json: enforce F&O 3.1 §17.4.2 structural validation (+10 XQTS HEAD)#6350
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:bugfix/fn-xml-to-json-over-permissive-validation

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 11, 2026

Summary

Adds per-element-type structural validation to fn:xml-to-json as required by F&O 3.1 §17.4.2 (XML Representation of JSON) and §17.5.4. Before this change, develop silently accepted inputs that violate the spec's structural rules and produced a JSON result rather than raising FOJS0006.

This is a follow-up to #6342 (which closed the walk-from-doc-root sub-cluster of fn-xml-to-json HEAD failures). It closes the over-permissive-validation sub-cluster identified in the 2026-05-10 triage of the FOJS0006 cluster.

What changed

exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java:

  • Reject non-whitespace text node children of <map> and <array> elements (whitespace, comments, and processing instructions are still ignored per spec).
  • Reject element children of leaf-type elements (<string>, <number>, <boolean>, <null>).
  • Reject no-namespace attributes other than key, escaped-key, and escaped; reject any attribute in the http://www.w3.org/2005/xpath-functions namespace (the schema's anyAttribute namespace=\"##other\").
  • Require escaped and escaped-key attribute values to be lexically valid xs:boolean (true/false/1/0).
  • Reject element names outside the six allowed local names (map, array, string, number, boolean, null) at start-tag rather than only at end-tag.

Per W3C bug 29917 / qt3tests xml-to-json-065, the escaped attribute is tolerated on non-string elements (treated as a no-op); only the lexical value is enforced.

Foreign-namespace attributes remain ignored, matching the schema's anyAttribute namespace=\"##other\" rule on every element type.

exist-core/src/test/xquery/xquery3/xml-to-json.xql: adds 14 XQSuite regression tests covering each new validation path plus the whitespace-allowed and foreign-namespace-attribute-ignored cases.

Spec references

F&O 3.1 §17.4.2:

The XDM representation of a JSON value … must have the type annotations obtained by validating the untyped representation against the schema given in C.2 Schema for the result of fn:json-to-xml. If it is untyped, then it must be an XDM instance such that validation against this schema would succeed; with the proviso that all attributes other than those in no namespace or in namespace http://www.w3.org/2005/xpath-functions are ignored.

F&O 3.1 §17.5.4 Error Conditions:

A dynamic error is raised [err:FOJS0006] if the value of $input is not a document or element node or is not valid according to the schema for the XML representation of JSON, or if a map element has two children whose normalized key values are the same.

F&O 3.1 Appendix C.2 SchemastringType declares the escaped attribute; nullType, booleanType, numberType, arrayType, mapType each declare only <xs:anyAttribute processContents=\"skip\" namespace=\"##other\"/>.

XQTS HEAD delta

fn-xml-to-json test set:

Newly passing Newly failing
Tests xml-to-json-{033, 040, 042, 043, 044, 062, 063, 069, 081, 082} (10) — (0)
Violation kind Tests
Text child of <map> / <array> 081, 082
Disallowed no-namespace attribute (yek) 033
Attribute in xpath-functions namespace 069
Invalid xs:boolean value for escaped-key 062
Invalid xs:boolean value for escaped 063
Element child of <string> 040
Element child of <boolean> 042
Element child of <null> 043, 044

Baseline: develop @ a3865db (2026-05-11 XQ 3.1 HEAD canonical baseline).

Test plan

  • mvn test -pl exist-core -Dtest=xquery.xquery3.XQuery3Tests — 1025 pass (includes 14 new regression cases)
  • XQTS HEAD fn-xml-to-json re-run shows +10 newly passing, 0 regressions vs develop baseline
  • PMD/Codacy: NPathComplexity stays under the 200 threshold after extracting validateStartElement and validateTextInContext helpers; no new findings (only pre-existing UnusedLocalVariable at line 73 and SimplifyBooleanExpressions at line 207).

🤖 Generated with Claude Code

Adds the per-element-type validation required by F&O 3.1 §17.4.2 (XML
Representation of JSON) and §17.5.4 (fn:xml-to-json):

- Reject non-whitespace text node children of <map>/<array> elements
  (whitespace, comments and PIs are still ignored per spec).
- Reject element children of leaf-type elements (<string>, <number>,
  <boolean>, <null>).
- Reject no-namespace attributes other than 'key', 'escaped-key',
  'escaped'; reject any attribute in the xpath-functions namespace
  (the schema's anyAttribute namespace="##other").
- Require 'escaped' and 'escaped-key' to hold a valid xs:boolean value.
- Reject element names outside the six allowed local names at start-tag
  rather than only at end-tag.

Per W3C bug 29917 / qt3tests xml-to-json-065, 'escaped' is tolerated on
non-string elements (treated as a no-op); only the lexical value is
enforced.

Foreign-namespace attributes remain ignored, matching the schema rule.

Closes the over-permissive-validation sub-cluster of fn-xml-to-json
FOJS0006 failures identified in the 2026-05-10 triage report
(predecessor PR eXist-db#6342 closed the walk-from-doc-root sub-cluster).

XQTS HEAD fn-xml-to-json: +10 newly passing, 0 regressions on
xml-to-json-{033, 040, 042, 043, 044, 062, 063, 069, 081, 082}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner May 11, 2026 15:37
}

/**
* Validate the current START_ELEMENT against the F&amp;O 3.1 §17.4.2 / §17.5.4 structural rules
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double encoded ampersand

}

/**
* Reject non-whitespace text node children of {@code map} and {@code array} per F&amp;O 3.1 §17.4.2.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.a.

}

/**
* Validate that the attributes on the current element conform to F&amp;O 3.1 §17.4.2 (the schema for JSON).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.a.

Addresses line-o's review comments on PR eXist-db#6350 (lines 267, 288, 329).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Fixed in 3576ffc874F&amp;OF&O at lines 267, 288, 329. Matches the bare-ampersand Javadoc convention used elsewhere in org.exist.xquery.functions.fn.* (e.g., FunSum, FunMin, FunParseIetfDate).

Addresses reinhapa's review on PR eXist-db#6350.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Done in 0f2b5b5751 — converted the START_ELEMENT switch at line 156 to arrow syntax. Kept the END_ELEMENT switch at line 179 unchanged for now (pre-existing, not touched by this PR's diff) — happy to convert it too if you'd like a consistency follow-up.

@line-o line-o added the xquery issue is related to xquery implementation label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

xquery issue is related to xquery implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants