Adds a document:match query function for substring matching against d column entries#3470
Open
drewfarris wants to merge 9 commits into
Open
Adds a document:match query function for substring matching against d column entries#3470drewfarris wants to merge 9 commits into
document:match query function for substring matching against d column entries#3470drewfarris wants to merge 9 commits into
Conversation
document:match query function for substring matching against…document:match query function for substring matching against d column entries
apmoriarty
reviewed
Mar 25, 2026
apmoriarty
reviewed
Mar 25, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
apmoriarty
reviewed
Mar 26, 2026
drewfarris
commented
Mar 30, 2026
apmoriarty
reviewed
Mar 30, 2026
apmoriarty
reviewed
Mar 30, 2026
FineAndDandy
reviewed
Apr 1, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
Comment on lines
+1159
to
+1162
| if (nestedScript == null) { | ||
| nestedScript = ArithmeticJexlEngines.getEngine(getArithmetic()).parse(nestedQuery.getQuery()); | ||
| } | ||
| return DocumentMatchFunctionVisitor.requiresDocumentMatchContext(nestedScript); |
Collaborator
There was a problem hiding this comment.
There are methods in JexlASTHelper that can parse the functions out of the JexlNode
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
FineAndDandy
reviewed
Apr 8, 2026
drewfarris
commented
Apr 9, 2026
… `d` column entries * Adds the `document:match(viewname, string)` and `document:match(string)` query functions that will scan the `d` columns of candidate documents at evaluation time and filter those candidates whose values do not contain the string specified. * Exposed via Lucene syntax using the `#DOCUMENT_MATCH` operator. * If no view name is included as a function parameter all 'd' columns will be scanned. * The viewname can be a prefix that ends with '*' to search all views with the specified prefix. * If the specified string is found, the view name and start offsets for matches will be stored as a JSON map in the DOCUMENT_MATCHES field in the result. This change includes: * Lucene-to-JEXL translation * Planner/iterator wiring * Runtime document-match evaluation * Configurable limits for `d` column sizes to prevent evaluation of large documents * Unit and integration tests While useful in its own right, this is a predecessor for more advanced matching functions on `d` column payloads.
d94ae23 to
472b64d
Compare
* Added javadoc regarding TRUE_NODE to JexlFunctionArgumentDescriptorFactory that shows this should be used when index searching should be skipped for a function * Added documentMatchMaxEncodedContextSize to limit total size of encoded d columns collected in DocumentMatchContextFunction.
* Clean up duplicate d column decode paths by tailoring the decode methods in ContentKeyValueFactory * Improve handling for documentMatchFunction cases in DatawaveInterpreter * Employ constants where possible
* Avoid clearing documentMatchContext in JexlEvaluation added tests to validate this is the right thing to do * Avoid merging all results into a single Attribute and choosing the first visbility, adds multiple values for the DOCUMENT_MATCHES field with the appropriate visibility based on the original d-column. * Significant refactoring of the return format as a result of avoiding merges - adds DocumentMatchResults object to hold results. * Updated the document match function to return the matched string if there's a successful match, an empty string if not. There was no need to return a full JSON object containing all matches because this comes from the DocumentMatchContext. * Properly dedups offsets in cases where multiple document match functions against the same query string return the same offsets for a document. * Updated unit tests to reflect new conditions, edge cases, incorrect input.
* Consolidate serialization to DocumentMatchResults * Removed the dead DocumentMatchFactory and EmptyDocumentMatchFunctions, updated QueryIterator and TLDQueryIterator to construct DocumentMatchContextFunction directly * Removed dead code from DocumentMatchResults (copy, contained search, payload builder) * Removed unnnecessary Content.withKeyMetadata helper * Cleaned up some brittleness in the tesks related to JSON assertions - now assert the structure instead of exact string nuts * Additional validation of visibility in unit tests
drewfarris
commented
Apr 13, 2026
Comment on lines
+78
to
+82
| Map<String,List<Integer>> jsonMatches = new LinkedHashMap<>(); | ||
| for (Map.Entry<String,SortedSet<Integer>> matchEntry : matches.entrySet()) { | ||
| jsonMatches.put(matchEntry.getKey(), new ArrayList<>(matchEntry.getValue())); | ||
| } | ||
| payload.put(MATCHES_FIELD, jsonMatches); |
Collaborator
Author
There was a problem hiding this comment.
Consider adding matches to the payload directly instead of converting to a LinkedHashMap
apmoriarty
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…
dcolumn entriesdocument:match(viewname, string)anddocument:match(string)query functions that will scan thedcolumns of candidate documents at evaluation time and filter those candidates whose values do not contain the string specified.#DOCUMENT_MATCHoperator.This change includes:
dcolumn sizes to prevent evaluation of large documentsWhile useful in its own right, this is a predecessor for more advanced matching functions on
dcolumn payloads.