VUFIND-1210: Use Solr JSON APIs#4991
Conversation
| * @var array | ||
| */ | ||
| protected $params = []; | ||
| protected $items = []; |
There was a problem hiding this comment.
All Solr queries will now have a very large array params within the ParamBag. Having the variable also called $params makes debugging a nightmare. Could be called $contents or whatever else.
There was a problem hiding this comment.
I guess we need to weigh debug readability against backward-compatibility; this change will break any custom subclasses of the ParamBag that might exist. Most likely, that's not really a problem, since I can't imagine a scenario where somebody would be using such a subclass -- but we should note this in the changelog just in case. As for what to call it, I think I might slightly prefer contents to items because it seems to fit better with the bag metaphor... but I don't really have strong feelings.
|
I considered implementing this with a flag to revert to the old method (query parameters) as needed. But the code changes are so significant that it would have been a real pain to support both. |
|
Basic functionality including basic faceting now works with the JSON APIs. Lots more to go. @demiankatz @sturkel89 I will eventually need a lot of testing help on this, there are a lot of VuFind features that use Solr that I've haven't used personally before so I can change the API but I don't really know what to expect, unless the integration tests are 100%. It's not ready yet though ... lots of TODOs in the code. I'm off on vacation until Jan 12th. Happy new year! |
| public function json(): string | ||
| { | ||
| $jsonObject = $this->jsonObject($this->items); | ||
| return json_encode($jsonObject); |
There was a problem hiding this comment.
Thanks. To start, I changed this to throw the error, which seems to be the usual practice in our codebase. Open to more specific handling if folks thing there is a useful fallback.
| ); | ||
| $fields = $this->getFields($shards); | ||
| $specs = $this->getSearchSpecs($fields); | ||
| $this->backend->getQueryBuilder()->setSpecs($specs); |
There was a problem hiding this comment.
The logic below is just a first draft, I don't like how it breaks the ParamBag abstraction completely in order to use array_filter. Either have to build some sort of filtering into ParamBag itself or just create a new ParamBag on the fly instead.
Also, I will need assistance actually testing this with shards.
There was a problem hiding this comment.
You should be able to test it with any two instances of Solr running the same schema. Let me know if you need more specific help on setting up a test environment; I haven't done it in a few years, but I don't recall it being too difficult.
There was a problem hiding this comment.
Ok thanks...will try pointing it at our test and prod environments...
There was a problem hiding this comment.
That should work as long as they contain different record IDs (otherwise it will be hard to tell if the sharding is working).
There was a problem hiding this comment.
I have (mostly) overlapping record IDs, so the sharding doesn't really work. But the faceting logic that I'm implementing does seem to -- I'm able to remove a facet group with [StripFields]. I suppose I can create a few fresh schemas and do partial indexes into each, but if someone already has sharding set up that will be a plus.
There was a problem hiding this comment.
One simple approach could be to spin up a standard test instance using phing startup in a virtual machine, and then shard that with your institutional test instance.
There was a problem hiding this comment.
(If you can't easily do this, I certainly can -- just have to find time, since I remain buried under a large review backlog right now).
There was a problem hiding this comment.
Sorry I'm still not having any luck with testing the shards config...if/when you have time, I'd appreciate it.
| * @return ParamBag | ||
| */ | ||
| public function build($id) | ||
| public function build(string $id, ?ParamBag $params = null): ParamBag |
There was a problem hiding this comment.
I've backported these SimilarBuilder changes to #5069. Waiting approval/merge there to finalize here.
There was a problem hiding this comment.
This has been merged. Is any further action needed before we close this conversation thread?
| $result = $clone->getFacetList(); | ||
| $filteredCounts = $clone->getFilteredFacetCounts(); | ||
|
|
||
| // Apply "facet contains" |
There was a problem hiding this comment.
This logic used to be in Solr/Params. But the JSON Facet API doesn't support it. (It supports prefix filtering, but I don't think that's sufficient.) So I've changed it to post-search filtering. A few thoughts:
- Will performance take a hit? It seems minor.
- It seems silly that the same filtering is being applied to all facet fields, even though in the UI there was only filtering on one. But that's the same as the previous behavior, and no harm in practice.
- If we're ok with this, then in theory it means I could also built back the regex matches logic, doing that as a post-search filter as well, as it's practically the same. And that regex matches logic actually is field-specific. @dmj ?
demiankatz
left a comment
There was a problem hiding this comment.
Thanks for the progress here, @maccabeelevine. Looks like there are some conflicts to resolve. Do you mind taking a look at those? Once they are fixed, I'll see if I can find time to set up a sharding scenario for testing. I suspect the quickest way to do this may be to run two Solr instances locally on different ports using SOLR_ADDITIONAL_START_OPTIONS=-Dsolr.disable.allowUrls=true ./solr.sh start to force-enable sharding.
Solr has a "new" (a decade old?) JSON-based query APIs, with JSON in the request body instead of a long series of query parameters. This new API in turn supports more complex faceting.
TODO