Skip to content
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e83416c
Phase 2 end-to-end test on first 10 muscular-system terms
dosumis Apr 27, 2026
3ad9844
Leaf flow: look up genus + part_of via obo-grep instead of single-col…
dosumis Apr 28, 2026
7385dbf
Re-test leaf flow with leaf_template_rows: both is_a AND part_of popu…
dosumis Apr 28, 2026
8229a6e
Enrichment experiment: 6 muscle terms across difficulty gradient
dosumis Apr 28, 2026
4ce9a09
Ovary enrichment experiment — hypothesis disproven
dosumis Apr 28, 2026
c544244
Phase 6 + Phase 7 (skeletal-muscle): system overlays + develops_from
dosumis Apr 28, 2026
42738e6
Validate Phase 6 + Phase 7 muscle overlay end-to-end on muscular-system
dosumis Apr 28, 2026
db5d64b
Full muscular-system run: 75 input terms processed end-to-end
dosumis May 11, 2026
0f2984b
Delete hra-muscular.template.tsv
dosumis May 11, 2026
0fdd84d
Add consolidated unresolvable.tsv report from the full muscular-syste…
dosumis May 11, 2026
77fd128
Add consolidated review.tsv: input rows joined with all findings per row
dosumis May 11, 2026
bb73ff6
review.tsv: add mapped_label, parent_correction_label, mapping_evidence
dosumis May 11, 2026
3f73ed6
Register hra_muscular component and surface template diffs in PRs
dosumis May 15, 2026
c21556f
Move 3 back-muscle groupings from EC template to manual curation in e…
dosumis May 15, 2026
e640036
Review fixes: part_of for 9900025, term_tracker_item column, move rep…
dosumis May 15, 2026
18346d5
Merge branch 'master' into add-hra-muscular-ntr
dosumis May 18, 2026
6436580
Wire subclasses to back-muscle grouping terms (9900020/9900055/9900063)
dosumis May 18, 2026
8117acb
Add posterior abdominal wall (UBERON:9900100); wire 4 muscles + group…
dosumis May 18, 2026
31eb4c9
Merge branch 'add-hra-muscular-ntr' of https://github.com/obophenotyp…
dosumis May 18, 2026
692925c
Reassign template-row UBERON IDs from 99xxxxx (temp) to 11xxxxx (OS r…
dosumis May 18, 2026
58c9677
Fix illegal-annotation-property QC: use 'depiction' obo shortcut, not…
dosumis May 18, 2026
de0719b
ASCTB-TEMP URLs -> ccf: CURIEs; fix articularis genu is_a to skeletal…
dosumis May 18, 2026
d5f52d5
Fix 8 unsat muscles: split spine location into attaches_to_part_of
dosumis May 18, 2026
2007e30
Declare RO:0002177 as ObjectProperty in muscular prefixes stub
dosumis May 18, 2026
f1995e6
Merge branch 'master' into add-hra-muscular-ntr
dosumis May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 145 additions & 39 deletions .claude/agents/ntr-term-researcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ The file contains:
"ntr_id": "http://purl.obolibrary.org/obo/UBERON_9900001",
"label": "term label",
"term_type": "leaf",
"system": "default | muscle",
"is_a": "INFER:UBERON:xxxxxxx | NEEDS_MAPPING:FMA:nnnnn | UNRESOLVABLE:...",
"part_of": "INFER:UBERON:xxxxxxx | ...",
"def_xref": "ref1|ref2|..."
Expand Down Expand Up @@ -183,40 +184,124 @@ For each term without a confirmed existing UBERON match:
- Where members are known and bounded, enumerate: "...comprising the X, Y, and Z."
- Length still 20–60 words.

## Step 7: Resolve Relationship Types (LEAF terms only)

For each `term_type: "leaf"` term, determine whether it should be `is_a` or `part_of` the
resolved parent:

**Use `part_of` when the term is a physical subdivision of the parent:**
- Named **layer**, zone, region, wall, surface, border, lumen, stroma, cortex, medulla
- Named **head**, belly, compartment, lobe, segment, fascicle of a specific named structure
- Any term where the phrase "is **contained within**", "is **a subdivision of**", or
"is **a layer of**" the parent is correct
- Examples: `corpus luteum granulosa lutein layer` **part_of** corpus luteum;
`clavicular head of pectoralis major` **part_of** pectoralis major;
`costal part of diaphragm` **part_of** diaphragm;
`cumulus oophorus oocyte complex` **part_of** antral follicle

**Use `is_a` when the term is a classification type within the parent category:**
- The parent is a **grouping class** (e.g. "muscle of neck", "ovarian follicle stage",
"cranial muscle") and the term is a **member of that category**
- The term can be truly described as "IS A [parent]" — i.e. it has all properties of the
parent and adds further specificity
- Examples: `anterior vertebral muscle` **is_a** muscle of neck;
`primary ovarian follicle` **is_a** ovarian follicle;
`dominant antral follicle` **is_a** antral follicle

**Quick test**: ask "Is a [term] a kind of [parent]?" (→ `is_a`) vs "Is a [term] inside/part
of a [parent]?" (→ `part_of`). When in doubt, prefer `part_of` for physically bounded
sub-structures and `is_a` for stages or functional subtypes.

If unclear after applying these rules: search `ols4` for existing children of the same parent
and check the relationship type they use; apply the same pattern.

Record each decision in `resolved_relationships`. (Skip for `term_type: "group"` —
relationships for those are encoded as `genus + part_of some Y` equivalent classes;
see Step 8.)
## Step 7: Resolve genus AND part_of for LEAF terms

For each `term_type: "leaf"` term, look up how UBERON defines similar specific structures
to determine BOTH a genus (`is_a`) class AND a `part_of` containing structure. UBERON
convention typically populates both for specific named anatomical entities — e.g.
`vastus lateralis` has `is_a: UBERON:0001630 ! muscle organ` AND
`relationship: part_of UBERON:0001377 ! quadriceps femoris`.

**Procedure:**

1. Use awk over `src/ontology/uberon-edit.obo` to find similar specific UBERON terms.
Examples for muscle subdivisions:
```bash
awk 'BEGIN{RS=""} /\nname: .*head of .*muscle/' src/ontology/uberon-edit.obo
awk 'BEGIN{RS=""} /\nname: .*part of .*muscle/' src/ontology/uberon-edit.obo
awk 'BEGIN{RS=""} /\nname: .*belly of/' src/ontology/uberon-edit.obo
awk 'BEGIN{RS=""} /\nid: UBERON:0001379\n/' src/ontology/uberon-edit.obo # vastus lateralis
```

2. From similar terms, extract the genus pattern. Common UBERON genus classes for
muscle leaf terms:
- `UBERON:0001630` muscle organ — for whole named individual muscles (e.g. articularis
genu, longus capitis, vastus lateralis)
- `UBERON:0011906` muscle head — for named heads of muscles (clavicular head, long
head, short head)
- `UBERON:0014892` skeletal muscle organ, vertebrate — for skeletal muscles when a
more specific class is unavailable
- `UBERON:0014892` or domain-specific (e.g. `UBERON:0001135` smooth muscle organ)
for non-skeletal cases

3. From similar terms, extract the part_of pattern. Common targets:
- For "X head/belly/part of Y muscle" → part_of the named parent muscle Y
- For named muscles in a region → part_of the region (e.g. neck, thigh,
anterior compartment)
- For named segmental muscles → part_of the relevant region (cervical vertebral
column, lumbar region, etc.)

4. Emit a `leaf_template_rows[label]` entry with `{"is_a": "UBERON:...", "part_of":
"UBERON:..."}`. **Both columns should be populated when applicable.**
- Set `is_a` only (omit `part_of`) for classification subtypes that don't have a
containing structure (e.g. `dominant antral follicle is_a antral follicle` — no
additional part_of needed beyond what the genus class implies).
- Set `part_of` only (omit `is_a` or use a very generic genus) when the term is
purely a subdivision and no specific genus class is available.

5. The legacy `resolved_relationships` + `resolved_parents` keys are still accepted as
a fallback but `leaf_template_rows` is preferred — it expresses both axes
simultaneously.

**Optional fields in `leaf_template_rows` (Phase 6 + 7):**

The default leaf template has an OPTIONAL `develops_from` column. The muscular-system
overlay also has `has_muscle_origin`, `has_muscle_insertion`, `innervated_by` columns.
Populate any of these in `leaf_template_rows[label]` when you have evidence:

```json
"leaf_template_rows": {
"early antral follicle": {
"is_a": "UBERON:0000037",
"develops_from": "UBERON:0000036"
},
"articularis genu muscle": {
"is_a": "UBERON:0001630",
"part_of": "UBERON:0000376",
"has_muscle_origin": "UBERON:0000981",
"has_muscle_insertion": "UBERON:0000976",
"innervated_by": "UBERON:0001267"
}
}
```

The merge step writes any of these to the corresponding column IF the column exists in
the current template variant. Unknown fields are silently dropped — you don't need to
know which template the row belongs to. Just emit whatever you can populate with
evidence.

**Stage-series guidance for `develops_from`:**

For terms in a developmental sequence (follicle stages, embryonic stages, hematopoietic
differentiation), look up the precursor stage via OLS4 / awk and emit `develops_from`.
Example: `early antral follicle` develops_from `secondary ovarian follicle`
(UBERON:0000036).

**Muscle-overlay guidance for `has_muscle_origin`/`has_muscle_insertion`/`innervated_by`:**

For `system: "muscle"` terms (the per-group JSON contains a `system` field per term),
extract origin/insertion/innervation from Wikipedia + UBERON precedent. The bone or
nerve labels in Wikipedia text typically need OLS4 lookup to resolve to UBERON IDs
(e.g. "femur" → UBERON:0000981, "femoral nerve" → UBERON:0001267). If a UBERON ID
cannot be resolved (named bone landmark, specific nerve branch missing from UBERON),
omit that field rather than guess.

**Worked examples:**

- `clavicular head of pectoralis major muscle`:
- Look up similar: UBERON:0007168 (long head of biceps brachii), UBERON:0007169 (short
head of biceps brachii) → both use `is_a: UBERON:0011906 ! muscle head` and
`relationship: part_of <named muscle>`.
- Emit: `{"is_a": "UBERON:0011906", "part_of": "UBERON:0002381"}`

- `articularis genu muscle`:
- Look up similar: vastus lateralis (UBERON:0001379) uses
`is_a: UBERON:0001630 ! muscle organ` + `part_of UBERON:0001377 ! quadriceps femoris`.
- For articularis genu, the analogous part_of would be the thigh region (or anterior
compartment of thigh if a UBERON term exists for it). Emit:
`{"is_a": "UBERON:0001630", "part_of": "UBERON:0004252"}` (or more specific).

- `costal part of respiratory diaphragm muscle`: similar UBERON pattern is to use a
domain part as `part_of` plus a generic genus. Already a confirmed match in this
case (UBERON:0035831), so this term is excluded from the leaf template.

- `dominant antral follicle` (a stage/subtype, no spatial part_of beyond the parent):
emit `{"is_a": "UBERON:0000035"}` only — omit `part_of`.

**Important — DO NOT just take the supplied source parent and assign it to one column.**
Look at similar UBERON terms first; the source parent is often too broad (a grouping class)
to serve as the genus, and a more specific genus may be obvious (muscle head, muscle
organ, etc.).

## Step 8: Group term equivalent class — genus + part_of some Y (GROUP terms only)

Expand Down Expand Up @@ -288,6 +373,16 @@ Save to: `bulk_ntr_workflow/outputs/definitions/{group_name}.json`
"def_xrefs_to_add": {
"term label": "PMID:12345678|PMID:87654321"
},
"leaf_template_rows": {
"leaf term label": {
"is_a": "UBERON:0011906",
"part_of": "UBERON:0002381",
"develops_from": "UBERON:0000036",
"has_muscle_origin": "UBERON:0001105",
"has_muscle_insertion": "UBERON:0000976",
"innervated_by": "UBERON:0003726"
}
},
"resolved_relationships": {
"leaf term label": "is_a | part_of"
},
Expand Down Expand Up @@ -361,17 +456,28 @@ Omit empty lists/dicts. Do NOT include a `fma_resolutions` key — use `resolved
- Every confirmed match must have both a UBERON definition and Wikipedia/literature evidence.
- Every new term must have at least one real PMID/DOI in `def_xrefs_to_add` or in the existing
`def_xref` input field (ASCTB-TEMP placeholders do not count as real references).
- `resolved_relationships` values must be `"is_a"` or `"part_of"` only.
- `resolved_parents` values must be real UBERON IDs retrieved from OLS4 — never guessed.
- Layers, zones, heads, bellies, parts of named structures → must be `part_of`, never `is_a`.
- For LEAF terms: prefer emitting `leaf_template_rows[label]` with both `is_a` and
`part_of` populated. Look up similar UBERON terms via awk over uberon-edit.obo to
find the right genus class — do NOT just assign the source parent to one column.
- `leaf_template_rows[label].is_a` should be a genus class (e.g. UBERON:0001630 muscle
organ, UBERON:0011906 muscle head), not a regional grouping class.
- `leaf_template_rows[label].part_of` should be the containing structure (parent muscle,
body region, compartment).
- For backward compatibility, `resolved_relationships` (values `"is_a"` or `"part_of"`)
+ `resolved_parents` may still be used; merge will fall back to these if
`leaf_template_rows` is absent.
- All UBERON ID values must be real UBERON IDs retrieved from OLS4 or uberon-edit.obo —
never guessed.
- Layers, zones, heads, bellies, parts of named structures → MUST have `part_of`
populated to the named parent structure.
- Pathological/dysfunctional terms → must appear in `out_of_scope`.
- Non-standard names → must appear in `name_corrections`.
- **For `term_type: "group"` terms**: every term must end up in EITHER
`group_template_rows` (with both `genus` and `location` populated as real UBERON IDs)
OR `manual_curation` (with proposed definition + similar UBERON terms). No group term
should be silently absent from both.
- `resolved_relationships` and `resolved_parents` apply to LEAF terms only — do not emit
these keys for group terms.
- `leaf_template_rows`, `resolved_relationships`, `resolved_parents` apply to LEAF terms
only — do not emit these keys for group terms.
- Do NOT invent UBERON IDs.

## Tools Available
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/diff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ jobs:
ref: ${{ steps.comment-branch.outputs.head_ref }}
- name: Classify ontology PR branch
if: steps.check.outputs.triggered == 'true'
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=false uberon-base.owl > TESTLOG.log
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=true uberon-base.owl > TESTLOG.log
- name: Upload classified ontology in PR branch
if: steps.check.outputs.triggered == 'true'
uses: actions/upload-artifact@v4
Expand All @@ -136,7 +136,7 @@ jobs:
ref: master
- name: Classify ontology main branch
if: steps.check.outputs.triggered == 'true'
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=false uberon-base.owl > TESTLOG.log
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=true uberon-base.owl > TESTLOG.log
- name: Upload classified ontology main branch
if: steps.check.outputs.triggered == 'true'
uses: actions/upload-artifact@v4
Expand Down
29 changes: 29 additions & 0 deletions bulk_ntr_workflow/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,13 +206,42 @@ After QC, both templates need to be registered with ODK.
| def_xref | >A oboInOwl:hasDbXref SPLIT=\| | References + ASCTB-TEMP IRI |
| is_a | SC % | Genus class (structural type or classification parent) |
| part_of | SC BFO:0000050 some % | Containing structure |
| develops_from | SC RO:0002202 some % | Optional. Developmental precursor (stage series) |
| In_subset | AI oboInOwl:inSubset | `added_by_HRA` subset IRI |
| Date | AT dcterms:date^^xsd:dateTime | ISO timestamp |
| Contributor | AI dcterms:contributor | ORCID IRI |
| Present_in_taxon | AI RO:0002175 | NCBITaxon IRI |
| Wikipedia_image | A foaf:depiction | Wikipedia image URL |
| xref | A oboInOwl:hasDbXref SPLIT=\| | Direct term xrefs: Wikipedia article + FMA ID |

### Muscle leaf template (`<name>-muscle.template.tsv`) — Phase 7 overlay

Used automatically when the source `tables` value is `muscular-system`. Adds three
columns between `develops_from` and `In_subset`:

| Header | ROBOT directive | Notes |
|---|---|---|
| has_muscle_origin | SC RO:0002372 some % | Bone/structure the muscle arises from |
| has_muscle_insertion | SC RO:0002373 some % | Bone/structure the muscle inserts onto |
| innervated_by | SC RO:0002005 some % | Motor nerve |

All three are OPTIONAL — empty cell ⇒ no axiom. Populate only when Wikipedia +
UBERON precedent provide a resolvable UBERON ID for the related entity.

### Template variants and partitioning

Stage 1 partitions input rows by source `tables` column → system overlay map:

| Source table value | Overlay | Output template |
|---|---|---|
| `muscular-system` | `muscle` | `<name>-muscle.template.tsv` |
| (anything else) | `default` | `<name>.template.tsv` |

A single Stage 1 run can produce multiple leaf templates if the input has rows from
mixed tables (each system gets its own clean template — no muscle-specific empty
columns appear in non-muscle templates). The routing decision is printed at the start
of Stage 1 as `Step 0 routing: muscle=N, default=M, group=K`.

### Groups template (`<name>-groups.template.tsv`) — equivalent class

Same as leaf, with `is_a` / `part_of` replaced by `genus` / `location`:
Expand Down
99 changes: 99 additions & 0 deletions bulk_ntr_workflow/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,3 +302,102 @@ Once Phase 2 is implemented, run the complete 75-term muscular-system table. Exp
Generalise to other ASCTB tables (nervous system, vasculature, etc.). The grouping vs leaf
distinction will apply across systems (e.g. "artery of X" vs "X artery", "region of cortex" vs
"X gyrus").

---

## Phase 6: Optional `develops_from` column on default leaf template ✅

**Status:** complete.

Added an optional `develops_from` column with directive `SC RO:0002202 some %` to the
default leaf template. Empty cell → no axiom emitted by ROBOT. Populated by the agent
when Wikipedia + UBERON precedent indicate a developmental precursor (stage series:
follicle stages, embryonic stages, hematopoietic differentiation, etc.).

Agent emits via `leaf_template_rows[label].develops_from` in its JSON output. Merge
silently drops the field if the column is absent in the current template variant.

---

## Phase 7: System overlays

The default leaf template captures only `is_a`, `part_of`, and (optional) `develops_from`.
Some anatomical systems benefit substantially from additional axiomatic richness (origin,
insertion, innervation for muscles; arterial supply / drainage for vasculature; etc.).
Phase 7 implements per-system template overlays — a system overlay is a leaf template
variant with extra columns covering system-specific connectivity relations.

Stage 1 routes input rows to the appropriate overlay based on the source `tables`
column. Per-system separation keeps each output template clean (no muscle-specific
empty columns in non-muscle templates).

### Phase 7 — Skeletal muscle overlay ✅

**Status:** complete.

For inputs with `tables == muscular-system`, Stage 1 produces
`<name>-muscle.template.tsv` instead of (or alongside) the default leaf template,
adding three columns:

| Column | ROBOT directive | Relation |
|---|---|---|
| has_muscle_origin | SC RO:0002372 some % | bone/structure muscle arises from |
| has_muscle_insertion | SC RO:0002373 some % | bone/structure muscle inserts onto |
| innervated_by | SC RO:0002005 some % | motor nerve |

All three OPTIONAL — populated only with evidence-quoted UBERON IDs. Coverage gaps
(e.g. "lateral pectoral nerve" not in UBERON) are captured as free-text notes in the
agent's output rather than guessed UBERON IDs.

### Phase 7 — Future overlays (NOT IMPLEMENTED)

| System | Source table | Suggested fields | Notes |
|---|---|---|---|
| Skeletal | `skeletal-system`? | `articulates_with`, `ossifies_via`, `composed_primarily_of` (bone tissue) | Bones often have rich articulation patterns |
| Vasculature | `vasculature` | `arterial_supply_to`, `drains_into`, `branch_of` | Connectivity is central to vasculature semantics |
| Nervous system | `nervous-system`, `allen-brain` | `innervates`, `synapsed_to`, `axon_in` | Cell-type heavy; CL ontology integration matters |

Each overlay should be added only when there's a real bulk NTR batch that would benefit
from it. The skeletal-muscle overlay was justified by the muscle enrichment experiment
(see `bulk_ntr_workflow/experiments/SUMMARY.md`); future overlays should similarly
follow an enrichment-experiment validation step before code commits.

---

## Phase 8: Term promotion to direct editing

**Status:** roadmap only.

When a templated term needs richer axiomatisation than its template supports — e.g. a
follicle stage that requires `has_component UBERON:0005170 minCardinality=2` (cardinality-
constrained intersection_of), or a complex term needing multiple `has_part` axioms with
CL: cell-type fillers — the templating system becomes a constraint rather than a help.

The proposed remedy: a "promote to direct editing" agent that:

1. Takes a term ID (or list) plus the desired richer axiom set.
2. Reads the current template TSV row for that ID.
3. Converts the row to OBO stanza form (mapping ROBOT directives back to OBO syntax:
`SC %` → `is_a`, `SC BFO:0000050 some %` → `relationship: part_of`, etc.).
4. Augments the stanza with the new axioms (intersection_of, cardinality, additional
relationship axioms).
5. Uses the standard checkout/checkin flow: writes to `terms/UBERON_NNNNNNN.obo`, then
`obo-checkin.pl` to merge into `uberon-edit.obo`.
6. Removes the row from the template TSV.
7. Runs the reasoner to confirm the new axiomatisation produces the expected
classification (no unsatisfiable, no unexpected new is_a).

This solves the templating lock-in concern: any term can be promoted to direct editing
later without losing its UBERON ID or history.

UX sketch:
```bash
bulk_ntr_workflow/scripts/promote_term.py UBERON:9900037 \
--add 'intersection_of: UBERON:0001305' \
--add 'intersection_of: has_component UBERON:0005170 {minCardinality="2"}' \
--add 'relationship: develops_from UBERON:0000035'
```

Or for batches, a YAML/TSV input listing which terms to promote with which axiom sets.
The agent should handle is_a-inheritance carefully (the inferred is_a after
intersection_of must still resolve to the previous genus + the new differentia).
Loading
Loading