Add Refset Maintenance Report listing inactive concepts in active simple-type refset members#24
Add Refset Maintenance Report listing inactive concepts in active simple-type refset members#24MattCordell wants to merge 17 commits into
Conversation
…GraphLoader configuration during import
…egacy Drugs Validation report
…ivation as module jumping
… to axiom eg for UK Medicinal Product concept promotion
…ecified refset id eg ICD10 map entries
…ple-type refset members Lists active members of simple-type reference sets (descendants of 446609009 |Simple type reference set|) where the referenced concept is inactive, with the inactivation reason and any historical associations. Useful for refset maintainers auditing outdated memberships. Unlike InactiveConceptInRefset, this report covers inactive concepts from any release cycle, not just the current one. An optional ECL parameter overrides the default refset scope. Refset memberships are read directly from Snowstorm via per-refset GET queries rather than concept.getOtherRefsetMembers() — the locally-loaded refset graph misses members for refsets not bundled in the loaded archive. Bumps TermServerClient.findRefsetMembers page size to 10000 to keep large refsets to a single round trip; the searchAfter loop still handles overflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Did you get this to run locally Matt, oh yes I see tweaking the page size gave you a massive time improvement. It is true that "locally-loaded refset graph misses members for refsets not bundled in the loaded archive", and published archives getting out of sync with what's stored in ElasticSearch is a problem, but it's the published archive that is considered authoritative, not the AP. It's also far cheaper to use the data you've already loaded into memory than make (hundreds?) of full size page requests to the server. Reporting was designed to build a snapshot based on published data + current authoring cycle delta and work with that. |
|
I tried a variety of combinations, and this report was the only one that produced the desired output. We could use a combination of Published, and AP (for current authoring cycle) - but I just couldn't get it to work. The scenario is we generate a report for a given refset(s) (owned by a third party) - to advise them all the concepts that have been inactivated and suggested replacements. Ideally we'd get a reply back within the release cycle and everything's good. But this almost never happens. Usually they only want to update once or twice a year - so the latest state of maintenance is what we're after. Traditionally we've used daily builds to get the report. |
c7b67ae to
08a9d5a
Compare
a922cf3 to
af5deb7
Compare
Summary
Previously discussed with @pgwilliams (I don't have access to the original jira/discussion).
RefsetMaintenanceReportunderreports/release/. Lists active members of simple-type reference sets (descendants of446609009 |Simple type reference set|) where the referenced concept is inactive, with the inactivation reason and any historical associations. An optionalECLparameter overrides the default refset scope.InactiveConceptInRefset, this report covers inactive concepts from any release cycle, not just the current one — useful for refset maintainers auditing outdated memberships.GET /members?referenceSet=<id>queries rather thanConcept.getOtherRefsetMembers(), because the locally-loaded refset graph misses members for refsets not bundled in the loaded archive.TermServerClient.findRefsetMembers(...)page size tolimit=10000so large refsets fit in a fewer round trips. The existingsearchAfterpagination loop still handles overflow, so behaviour for any caller is unchanged — just faster. (local performance comparison was 6 hours down to 30mins!)Test plan
MAIN/SNOMEDCT-AU(or another extension branch) and confirm the row count for a known refset matches a SQL/manual baseline.ECLparameter and confirmFound N reference set(s) in scopematches the count of<446609009descendants on the target branch.ECL= a single refset ID and confirm only that refset's outdated memberships are reported. (e.g. 933483571000036108 refset, has 3 inactive concepts as members as of April 2026).InactiveConceptInRefset(also usesfindRefsetMembers-family methods).