Skip to content

Add Refset Maintenance Report listing inactive concepts in active simple-type refset members#24

Open
MattCordell wants to merge 17 commits into
IHTSDO:developfrom
aehrc:feature/refset-maintenance-report
Open

Add Refset Maintenance Report listing inactive concepts in active simple-type refset members#24
MattCordell wants to merge 17 commits into
IHTSDO:developfrom
aehrc:feature/refset-maintenance-report

Conversation

@MattCordell
Copy link
Copy Markdown

@MattCordell MattCordell commented Apr 27, 2026

Summary

Previously discussed with @pgwilliams (I don't have access to the original jira/discussion).

  • Adds RefsetMaintenanceReport under reports/release/. Lists active members of simple-type reference sets (descendants of 446609009 |Simple type reference set|) where the referenced concept is inactive, with the inactivation reason and any historical associations. An optional ECL parameter overrides the default refset scope.
  • Unlike InactiveConceptInRefset, this report covers inactive concepts from any release cycle, not just the current one — useful for refset maintainers auditing outdated memberships.
  • Refset memberships are read directly from Snowstorm via per-refset GET /members?referenceSet=<id> queries rather than Concept.getOtherRefsetMembers(), because the locally-loaded refset graph misses members for refsets not bundled in the loaded archive.
  • Bumps TermServerClient.findRefsetMembers(...) page size to limit=10000 so large refsets fit in a fewer round trips. The existing searchAfter pagination loop still handles overflow, so behaviour for any caller is unchanged — just faster. (local performance comparison was 6 hours down to 30mins!)

Test plan

  • Run against MAIN/SNOMEDCT-AU (or another extension branch) and confirm the row count for a known refset matches a SQL/manual baseline.
  • Run with no ECL parameter and confirm Found N reference set(s) in scope matches the count of <446609009 descendants on the target branch.
  • Run with ECL = a single refset ID and confirm only that refset's outdated memberships are reported. (e.g. 933483571000036108 refset, has 3 inactive concepts as members as of April 2026).
  • Confirm no regression in InactiveConceptInRefset (also uses findRefsetMembers-family methods).

pgwilliams and others added 17 commits April 2, 2026 18:45
… to axiom eg for UK Medicinal Product concept promotion
…ple-type refset members

Lists active members of simple-type reference sets (descendants of
446609009 |Simple type reference set|) where the referenced concept
is inactive, with the inactivation reason and any historical
associations. Useful for refset maintainers auditing outdated
memberships.

Unlike InactiveConceptInRefset, this report covers inactive concepts
from any release cycle, not just the current one. An optional ECL
parameter overrides the default refset scope.

Refset memberships are read directly from Snowstorm via per-refset
GET queries rather than concept.getOtherRefsetMembers() — the
locally-loaded refset graph misses members for refsets not bundled
in the loaded archive.

Bumps TermServerClient.findRefsetMembers page size to 10000 to keep
large refsets to a single round trip; the searchAfter loop still
handles overflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pgwilliams
Copy link
Copy Markdown
Member

Did you get this to run locally Matt, oh yes I see tweaking the page size gave you a massive time improvement.

It is true that "locally-loaded refset graph misses members for refsets not bundled in the loaded archive", and published archives getting out of sync with what's stored in ElasticSearch is a problem, but it's the published archive that is considered authoritative, not the AP. It's also far cheaper to use the data you've already loaded into memory than make (hundreds?) of full size page requests to the server. Reporting was designed to build a snapshot based on published data + current authoring cycle delta and work with that.

@MattCordell
Copy link
Copy Markdown
Author

I tried a variety of combinations, and this report was the only one that produced the desired output. We could use a combination of Published, and AP (for current authoring cycle) - but I just couldn't get it to work.

The scenario is we generate a report for a given refset(s) (owned by a third party) - to advise them all the concepts that have been inactivated and suggested replacements.

Ideally we'd get a reply back within the release cycle and everything's good. But this almost never happens. Usually they only want to update once or twice a year - so the latest state of maintenance is what we're after.

Traditionally we've used daily builds to get the report.

@pgwilliams pgwilliams force-pushed the develop branch 3 times, most recently from c7b67ae to 08a9d5a Compare May 13, 2026 15:01
@pgwilliams pgwilliams force-pushed the develop branch 6 times, most recently from a922cf3 to af5deb7 Compare May 19, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants