Implement W3C XQuery and XPath Full Text 3.0#6215
Conversation
Add full text grammar productions to XQuery.g parser and XQueryTree.g tree walker for the W3C XQuery and XPath Full Text 3.0 specification. This establishes the parsing foundation for ftcontains expressions, FTSelection operators (FTOr, FTAnd, FTMildNot, FTUnaryNot, FTWords), and positional filters (FTOrder, FTWindow, FTDistance, FTScope, FTContent, FTTimes). The AST expression classes in org.exist.xquery.ft model the full text selection grammar as a tree of FTAbstractExpr nodes. Each node corresponds to a production in the XQFT grammar and carries the evaluation semantics defined in the spec. Spec references: - W3C XQuery and XPath Full Text 3.0, Section 3.1 (Full-Text Selections) - W3C XQuery and XPath Full Text 3.0, Section 3.2 (Full-Text Contains) - W3C XQuery and XPath Full Text 3.0, Section 3.3 (Positional Filters) FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement the full text evaluation engine (FTEvaluator) using the sequential AllMatches model defined in W3C XQFT 3.0, Section 4. The evaluator tokenizes string values, applies match options (stemming, wildcards, diacritics sensitivity, case sensitivity, stop words, language), and evaluates the full text selection tree against token streams. FTContainsExpr is the top-level expression node for `contains text` expressions, bridging the XQuery evaluation pipeline to the FT evaluator. FTMatchOptions aggregates all match option settings. FTThesaurus provides synonym expansion via configurable thesaurus URIs, with lazy initialization for runtime efficiency. Spec references: - W3C XQuery and XPath Full Text 3.0, Section 4 (Full-Text Evaluation) - W3C XQuery and XPath Full Text 3.0, Section 4.1 (AllMatches) - W3C XQuery and XPath Full Text 3.0, Section 5 (Match Options) - W3C XQuery and XPath Full Text 3.0, Section 5.6 (Thesaurus Option) - W3C XQuery and XPath Full Text 3.0, Section 5.7 (Stop Word Option) FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend ForExpr and LetExpr to support optional `score` variable bindings as defined in XQFT 3.0. The score variable captures the relevance score from full-text matching for use in ordering or filtering. Add XQFT-specific error codes (FTST0008, FTST0009, FTDY0016, FTDY0017, FTDY0020) to ErrorCodes.java. Update XQueryContext with thesaurus and stop-word URI map caching to survive context resets, fixing a bug where FT match options were lost during module imports. Fix FTMatchOptions import in XQueryContext to use the correct org.exist.xquery.ft package path. Update StaticXQueryException and XQuery.java for full-text error propagation during static analysis. Spec references: - W3C XQuery and XPath Full Text 3.0, Section 2.3 (Score Variables) - W3C XQuery and XPath Full Text 3.0, Appendix B (Error Conditions) FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add four test classes covering the W3C XQFT 3.0 implementation: - FTConformanceTest: 622-line conformance suite covering the core XQFT test cases mapped from the W3C Full Text Test Suite (FTTS), verifying spec compliance for contains-text expressions, match options, and positional filters. - FTContainsTest: Integration tests exercising ftcontains expressions end-to-end through the XQuery engine, including edge cases for empty sequences, mixed content, and attribute nodes. - FTEvaluatorTest: Unit tests for the AllMatches evaluator, covering tokenization, match option application, and boolean composition. - FTParserTest: Parser tests verifying that the ANTLR 2 grammar correctly parses all XQFT productions and builds the expected AST. FTTS compliance: 661/667 (99.1%) — 6 remaining are spec ambiguities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add default cases to switches, fix parameter reassignment in FTContainsExpr.eval(), collapse nested if in FTEvaluator, move field declarations before inner classes, replace FQNs with imports in XQueryContext, and suppress NPathComplexity on FTEvaluator class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] CI state: 8/9 checks pass. The 1 remaining failure (macOS integration) is a pre-existing test hang unrelated to this PR. Dependencies: Wave 3. Should merge after For full context on all 7.0 PRs and the merge order, see the Reviewer Guide. |
PMD flagged ForExpr.eval and LetExpr.eval above the 200 NPath threshold after this branch added XQFT score-binding handling alongside the existing FLWOR profiler/dependency/cardinality dispatch. Branches map to FLWOR + XQFT spec rules; reorganizing obscures the spec mapping. Suppress with @SuppressWarnings("PMD.NPathComplexity") and rationale comments. The new ft/ classes (FTContainsExpr, FTSelection, etc.) do not have NPath violations. The remaining flagged methods on this branch (XQuery.compile/execute, XQueryContext methods) are pre-existing in code only marginally modified (1-5 lines per method) and are out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the project convention, do not add @SuppressWarnings("PMD.NPathComplexity") annotations proactively. Let the reviewer decide whether to suppress or refactor. Removes the four annotations added in 2d8abe0 (FTEvaluator, FTContainsExpr, ForExpr, LetExpr). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| @Override | ||
| @SuppressWarnings("PMD.NPathComplexity") | ||
| public Sequence eval(final Sequence contextSequence, final Item contextItem) throws XPathException { |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| * (two or more consecutive newlines, possibly with whitespace between) | ||
| * OR by element boundaries from the DOM structure. | ||
| */ | ||
| private int[] buildParagraphMap(final String text, final int[] offsets, |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| * FTWords: the terminal matching node. | ||
| * Evaluates the words value, tokenizes it, and finds matches in the source tokens. | ||
| */ | ||
| AllMatches evalFTWords(final FTWords ftWords, final FTMatchOptions options) |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| * Check if a source token matches a search word. | ||
| * @param rawSourceToken token with trailing punctuation preserved (for wildcard matching), or null | ||
| */ | ||
| private boolean wordMatches(final String sourceToken, final String rawSourceToken, |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| * to approximate stems for full-text comparison. Based on a simplified | ||
| * version of the Porter stemming algorithm. | ||
| */ | ||
| static String stem(final String word) { |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| * | ||
| * @throws XPathException FTST0008 if an external stop word URI cannot be loaded | ||
| */ | ||
| private Set<String> collectStopWords(final FTMatchOptions options, final boolean caseInsensitive, |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| return singleEmptyMatch(); // inner didn't match → overall matches | ||
| } | ||
|
|
||
| AllMatches evalFTPrimaryWithOptions(final FTPrimaryWithOptions pwo, final FTMatchOptions inheritedOptions) |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| * must be in a distinct unit (multi-unit StringIncludes that span unit boundaries | ||
| * are never rejected). | ||
| */ | ||
| private AllMatches applyScope(final AllMatches input, final FTScope ftScope) { |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
| /** | ||
| * Merge inherited options with local overrides. | ||
| */ | ||
| static FTMatchOptions mergeOptions(final FTMatchOptions inherited, final FTMatchOptions local) { |
There was a problem hiding this comment.
Address the NPath complexity as stated by Codacy
Summary
Implements
contains textexpressions with stemming, thesaurus, wildcards, proximity, and scoring per the W3C Full Text 3.0 spec.Spec References
XQTS
Tests
Supersedes
Test plan
contains textwith stemming, wildcards, proximity works🤖 Generated with Claude Code