Skip to content

VUFIND-1210: Use Solr JSON APIs#4991

Draft
maccabeelevine wants to merge 38 commits into
vufind-org:devfrom
maccabeelevine:solr-json-search
Draft

VUFIND-1210: Use Solr JSON APIs#4991
maccabeelevine wants to merge 38 commits into
vufind-org:devfrom
maccabeelevine:solr-json-search

Conversation

@maccabeelevine
Copy link
Copy Markdown
Member

@maccabeelevine maccabeelevine commented Dec 31, 2025

Solr has a "new" (a decade old?) JSON-based query APIs, with JSON in the request body instead of a long series of query parameters. This new API in turn supports more complex faceting.

TODO

  • Lots of cleanup
  • Marked TODOs in the code
  • Convert facet params to new facet API, so it actually works again
  • Fix all the tests
  • Resolve VUFIND-1210 when merging
  • Note breaking change to DefaultParameters syntax. There is an error for the old syntax but no backwards compatibility.
  • Note breaking change that "facet_matches_by_field" (regex) support in facets.ini is removed, though the similar facet_prefix_by_field capability is still supported.

Comment thread module/VuFindSearch/src/VuFindSearch/Backend/Solr/Backend.php Outdated
Comment thread module/VuFindSearch/src/VuFindSearch/Backend/Solr/Backend.php Outdated
* @var array
*/
protected $params = [];
protected $items = [];
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Solr queries will now have a very large array params within the ParamBag. Having the variable also called $params makes debugging a nightmare. Could be called $contents or whatever else.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to weigh debug readability against backward-compatibility; this change will break any custom subclasses of the ParamBag that might exist. Most likely, that's not really a problem, since I can't imagine a scenario where somebody would be using such a subclass -- but we should note this in the changelog just in case. As for what to call it, I think I might slightly prefer contents to items because it seems to fit better with the bag metaphor... but I don't really have strong feelings.

Comment thread module/VuFindSearch/src/VuFindSearch/ParamBagBag.php Outdated
@maccabeelevine
Copy link
Copy Markdown
Member Author

I considered implementing this with a flag to revert to the old method (query parameters) as needed. But the code changes are so significant that it would have been a real pain to support both.

@demiankatz demiankatz added this to the 12.0 milestone Jan 1, 2026
@demiankatz demiankatz added the architecture pull requests that involve significant refactoring / architectural changes label Jan 1, 2026
@maccabeelevine
Copy link
Copy Markdown
Member Author

Basic functionality including basic faceting now works with the JSON APIs. Lots more to go. @demiankatz @sturkel89 I will eventually need a lot of testing help on this, there are a lot of VuFind features that use Solr that I've haven't used personally before so I can change the API but I don't really know what to expect, unless the integration tests are 100%. It's not ready yet though ... lots of TODOs in the code. I'm off on vacation until Jan 12th. Happy new year!

Comment thread module/VuFindSearch/src/VuFindSearch/ParamBagBag.php Outdated
public function json(): string
{
$jsonObject = $this->jsonObject($this->items);
return json_encode($jsonObject);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should deal with JSON errors

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. To start, I changed this to throw the error, which seems to be the usual practice in our codebase. Open to more specific handling if folks thing there is a useful fallback.

Comment thread module/VuFind/src/VuFind/Search/Solr/DefaultParametersListener.php Outdated
);
$fields = $this->getFields($shards);
$specs = $this->getSearchSpecs($fields);
$this->backend->getQueryBuilder()->setSpecs($specs);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic below is just a first draft, I don't like how it breaks the ParamBag abstraction completely in order to use array_filter. Either have to build some sort of filtering into ParamBag itself or just create a new ParamBag on the fly instead.

Also, I will need assistance actually testing this with shards.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to test it with any two instances of Solr running the same schema. Let me know if you need more specific help on setting up a test environment; I haven't done it in a few years, but I don't recall it being too difficult.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks...will try pointing it at our test and prod environments...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work as long as they contain different record IDs (otherwise it will be hard to tell if the sharding is working).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have (mostly) overlapping record IDs, so the sharding doesn't really work. But the faceting logic that I'm implementing does seem to -- I'm able to remove a facet group with [StripFields]. I suppose I can create a few fresh schemas and do partial indexes into each, but if someone already has sharding set up that will be a plus.

Copy link
Copy Markdown
Member

@demiankatz demiankatz Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One simple approach could be to spin up a standard test instance using phing startup in a virtual machine, and then shard that with your institutional test instance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If you can't easily do this, I certainly can -- just have to find time, since I remain buried under a large review backlog right now).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm still not having any luck with testing the shards config...if/when you have time, I'd appreciate it.

Comment thread module/VuFind/src/VuFind/Search/Solr/Params.php Outdated
Comment thread module/VuFind/src/VuFind/Hierarchy/TreeDataSource/Solr.php
Comment thread module/VuFind/src/VuFind/Search/Solr/Params.php
Comment thread module/VuFindSearch/tests/unit-tests/src/VuFindTest/ParamBagBagTest.php Outdated
* @return ParamBag
*/
public function build($id)
public function build(string $id, ?ParamBag $params = null): ParamBag
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've backported these SimilarBuilder changes to #5069. Waiting approval/merge there to finalize here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been merged. Is any further action needed before we close this conversation thread?

$result = $clone->getFacetList();
$filteredCounts = $clone->getFilteredFacetCounts();

// Apply "facet contains"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic used to be in Solr/Params. But the JSON Facet API doesn't support it. (It supports prefix filtering, but I don't think that's sufficient.) So I've changed it to post-search filtering. A few thoughts:

  • Will performance take a hit? It seems minor.
  • It seems silly that the same filtering is being applied to all facet fields, even though in the UI there was only filtering on one. But that's the same as the previous behavior, and no harm in practice.
  • If we're ok with this, then in theory it means I could also built back the regex matches logic, doing that as a post-search filter as well, as it's practically the same. And that regex matches logic actually is field-specific. @dmj ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is field specific.

Copy link
Copy Markdown
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the progress here, @maccabeelevine. Looks like there are some conflicts to resolve. Do you mind taking a look at those? Once they are fixed, I'll see if I can find time to set up a sharding scenario for testing. I suspect the quickest way to do this may be to run two Solr instances locally on different ports using SOLR_ADDITIONAL_START_OPTIONS=-Dsolr.disable.allowUrls=true ./solr.sh start to force-enable sharding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture pull requests that involve significant refactoring / architectural changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants