Skip to content
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e83416c
Phase 2 end-to-end test on first 10 muscular-system terms
dosumis Apr 27, 2026
3ad9844
Leaf flow: look up genus + part_of via obo-grep instead of single-col…
dosumis Apr 28, 2026
7385dbf
Re-test leaf flow with leaf_template_rows: both is_a AND part_of popu…
dosumis Apr 28, 2026
8229a6e
Enrichment experiment: 6 muscle terms across difficulty gradient
dosumis Apr 28, 2026
4ce9a09
Ovary enrichment experiment — hypothesis disproven
dosumis Apr 28, 2026
c544244
Phase 6 + Phase 7 (skeletal-muscle): system overlays + develops_from
dosumis Apr 28, 2026
42738e6
Validate Phase 6 + Phase 7 muscle overlay end-to-end on muscular-system
dosumis Apr 28, 2026
db5d64b
Full muscular-system run: 75 input terms processed end-to-end
dosumis May 11, 2026
0f2984b
Delete hra-muscular.template.tsv
dosumis May 11, 2026
0fdd84d
Add consolidated unresolvable.tsv report from the full muscular-syste…
dosumis May 11, 2026
77fd128
Add consolidated review.tsv: input rows joined with all findings per row
dosumis May 11, 2026
bb73ff6
review.tsv: add mapped_label, parent_correction_label, mapping_evidence
dosumis May 11, 2026
3f73ed6
Register hra_muscular component and surface template diffs in PRs
dosumis May 15, 2026
c21556f
Move 3 back-muscle groupings from EC template to manual curation in e…
dosumis May 15, 2026
e640036
Review fixes: part_of for 9900025, term_tracker_item column, move rep…
dosumis May 15, 2026
18346d5
Merge branch 'master' into add-hra-muscular-ntr
dosumis May 18, 2026
6436580
Wire subclasses to back-muscle grouping terms (9900020/9900055/9900063)
dosumis May 18, 2026
8117acb
Add posterior abdominal wall (UBERON:9900100); wire 4 muscles + group…
dosumis May 18, 2026
31eb4c9
Merge branch 'add-hra-muscular-ntr' of https://github.com/obophenotyp…
dosumis May 18, 2026
692925c
Reassign template-row UBERON IDs from 99xxxxx (temp) to 11xxxxx (OS r…
dosumis May 18, 2026
58c9677
Fix illegal-annotation-property QC: use 'depiction' obo shortcut, not…
dosumis May 18, 2026
de0719b
ASCTB-TEMP URLs -> ccf: CURIEs; fix articularis genu is_a to skeletal…
dosumis May 18, 2026
d5f52d5
Fix 8 unsat muscles: split spine location into attaches_to_part_of
dosumis May 18, 2026
2007e30
Declare RO:0002177 as ObjectProperty in muscular prefixes stub
dosumis May 18, 2026
f1995e6
Merge branch 'master' into add-hra-muscular-ntr
dosumis May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 145 additions & 39 deletions .claude/agents/ntr-term-researcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ The file contains:
"ntr_id": "http://purl.obolibrary.org/obo/UBERON_9900001",
"label": "term label",
"term_type": "leaf",
"system": "default | muscle",
"is_a": "INFER:UBERON:xxxxxxx | NEEDS_MAPPING:FMA:nnnnn | UNRESOLVABLE:...",
"part_of": "INFER:UBERON:xxxxxxx | ...",
"def_xref": "ref1|ref2|..."
Expand Down Expand Up @@ -183,40 +184,124 @@ For each term without a confirmed existing UBERON match:
- Where members are known and bounded, enumerate: "...comprising the X, Y, and Z."
- Length still 20–60 words.

## Step 7: Resolve Relationship Types (LEAF terms only)

For each `term_type: "leaf"` term, determine whether it should be `is_a` or `part_of` the
resolved parent:

**Use `part_of` when the term is a physical subdivision of the parent:**
- Named **layer**, zone, region, wall, surface, border, lumen, stroma, cortex, medulla
- Named **head**, belly, compartment, lobe, segment, fascicle of a specific named structure
- Any term where the phrase "is **contained within**", "is **a subdivision of**", or
"is **a layer of**" the parent is correct
- Examples: `corpus luteum granulosa lutein layer` **part_of** corpus luteum;
`clavicular head of pectoralis major` **part_of** pectoralis major;
`costal part of diaphragm` **part_of** diaphragm;
`cumulus oophorus oocyte complex` **part_of** antral follicle

**Use `is_a` when the term is a classification type within the parent category:**
- The parent is a **grouping class** (e.g. "muscle of neck", "ovarian follicle stage",
"cranial muscle") and the term is a **member of that category**
- The term can be truly described as "IS A [parent]" — i.e. it has all properties of the
parent and adds further specificity
- Examples: `anterior vertebral muscle` **is_a** muscle of neck;
`primary ovarian follicle` **is_a** ovarian follicle;
`dominant antral follicle` **is_a** antral follicle

**Quick test**: ask "Is a [term] a kind of [parent]?" (→ `is_a`) vs "Is a [term] inside/part
of a [parent]?" (→ `part_of`). When in doubt, prefer `part_of` for physically bounded
sub-structures and `is_a` for stages or functional subtypes.

If unclear after applying these rules: search `ols4` for existing children of the same parent
and check the relationship type they use; apply the same pattern.

Record each decision in `resolved_relationships`. (Skip for `term_type: "group"` —
relationships for those are encoded as `genus + part_of some Y` equivalent classes;
see Step 8.)
## Step 7: Resolve genus AND part_of for LEAF terms

For each `term_type: "leaf"` term, look up how UBERON defines similar specific structures
to determine BOTH a genus (`is_a`) class AND a `part_of` containing structure. UBERON
convention typically populates both for specific named anatomical entities — e.g.
`vastus lateralis` has `is_a: UBERON:0001630 ! muscle organ` AND
`relationship: part_of UBERON:0001377 ! quadriceps femoris`.

**Procedure:**

1. Use awk over `src/ontology/uberon-edit.obo` to find similar specific UBERON terms.
Examples for muscle subdivisions:
```bash
awk 'BEGIN{RS=""} /\nname: .*head of .*muscle/' src/ontology/uberon-edit.obo
awk 'BEGIN{RS=""} /\nname: .*part of .*muscle/' src/ontology/uberon-edit.obo
awk 'BEGIN{RS=""} /\nname: .*belly of/' src/ontology/uberon-edit.obo
awk 'BEGIN{RS=""} /\nid: UBERON:0001379\n/' src/ontology/uberon-edit.obo # vastus lateralis
```

2. From similar terms, extract the genus pattern. Common UBERON genus classes for
muscle leaf terms:
- `UBERON:0001630` muscle organ — for whole named individual muscles (e.g. articularis
genu, longus capitis, vastus lateralis)
- `UBERON:0011906` muscle head — for named heads of muscles (clavicular head, long
head, short head)
- `UBERON:0014892` skeletal muscle organ, vertebrate — for skeletal muscles when a
more specific class is unavailable
- `UBERON:0014892` or domain-specific (e.g. `UBERON:0001135` smooth muscle organ)
for non-skeletal cases

3. From similar terms, extract the part_of pattern. Common targets:
- For "X head/belly/part of Y muscle" → part_of the named parent muscle Y
- For named muscles in a region → part_of the region (e.g. neck, thigh,
anterior compartment)
- For named segmental muscles → part_of the relevant region (cervical vertebral
column, lumbar region, etc.)

4. Emit a `leaf_template_rows[label]` entry with `{"is_a": "UBERON:...", "part_of":
"UBERON:..."}`. **Both columns should be populated when applicable.**
- Set `is_a` only (omit `part_of`) for classification subtypes that don't have a
containing structure (e.g. `dominant antral follicle is_a antral follicle` — no
additional part_of needed beyond what the genus class implies).
- Set `part_of` only (omit `is_a` or use a very generic genus) when the term is
purely a subdivision and no specific genus class is available.

5. The legacy `resolved_relationships` + `resolved_parents` keys are still accepted as
a fallback but `leaf_template_rows` is preferred — it expresses both axes
simultaneously.

**Optional fields in `leaf_template_rows` (Phase 6 + 7):**

The default leaf template has an OPTIONAL `develops_from` column. The muscular-system
overlay also has `has_muscle_origin`, `has_muscle_insertion`, `innervated_by` columns.
Populate any of these in `leaf_template_rows[label]` when you have evidence:

```json
"leaf_template_rows": {
"early antral follicle": {
"is_a": "UBERON:0000037",
"develops_from": "UBERON:0000036"
},
"articularis genu muscle": {
"is_a": "UBERON:0001630",
"part_of": "UBERON:0000376",
"has_muscle_origin": "UBERON:0000981",
"has_muscle_insertion": "UBERON:0000976",
"innervated_by": "UBERON:0001267"
}
}
```

The merge step writes any of these to the corresponding column IF the column exists in
the current template variant. Unknown fields are silently dropped — you don't need to
know which template the row belongs to. Just emit whatever you can populate with
evidence.

**Stage-series guidance for `develops_from`:**

For terms in a developmental sequence (follicle stages, embryonic stages, hematopoietic
differentiation), look up the precursor stage via OLS4 / awk and emit `develops_from`.
Example: `early antral follicle` develops_from `secondary ovarian follicle`
(UBERON:0000036).

**Muscle-overlay guidance for `has_muscle_origin`/`has_muscle_insertion`/`innervated_by`:**

For `system: "muscle"` terms (the per-group JSON contains a `system` field per term),
extract origin/insertion/innervation from Wikipedia + UBERON precedent. The bone or
nerve labels in Wikipedia text typically need OLS4 lookup to resolve to UBERON IDs
(e.g. "femur" → UBERON:0000981, "femoral nerve" → UBERON:0001267). If a UBERON ID
cannot be resolved (named bone landmark, specific nerve branch missing from UBERON),
omit that field rather than guess.

**Worked examples:**

- `clavicular head of pectoralis major muscle`:
- Look up similar: UBERON:0007168 (long head of biceps brachii), UBERON:0007169 (short
head of biceps brachii) → both use `is_a: UBERON:0011906 ! muscle head` and
`relationship: part_of <named muscle>`.
- Emit: `{"is_a": "UBERON:0011906", "part_of": "UBERON:0002381"}`

- `articularis genu muscle`:
- Look up similar: vastus lateralis (UBERON:0001379) uses
`is_a: UBERON:0001630 ! muscle organ` + `part_of UBERON:0001377 ! quadriceps femoris`.
- For articularis genu, the analogous part_of would be the thigh region (or anterior
compartment of thigh if a UBERON term exists for it). Emit:
`{"is_a": "UBERON:0001630", "part_of": "UBERON:0004252"}` (or more specific).

- `costal part of respiratory diaphragm muscle`: similar UBERON pattern is to use a
domain part as `part_of` plus a generic genus. Already a confirmed match in this
case (UBERON:0035831), so this term is excluded from the leaf template.

- `dominant antral follicle` (a stage/subtype, no spatial part_of beyond the parent):
emit `{"is_a": "UBERON:0000035"}` only — omit `part_of`.

**Important — DO NOT just take the supplied source parent and assign it to one column.**
Look at similar UBERON terms first; the source parent is often too broad (a grouping class)
to serve as the genus, and a more specific genus may be obvious (muscle head, muscle
organ, etc.).

## Step 8: Group term equivalent class — genus + part_of some Y (GROUP terms only)

Expand Down Expand Up @@ -288,6 +373,16 @@ Save to: `bulk_ntr_workflow/outputs/definitions/{group_name}.json`
"def_xrefs_to_add": {
"term label": "PMID:12345678|PMID:87654321"
},
"leaf_template_rows": {
"leaf term label": {
"is_a": "UBERON:0011906",
"part_of": "UBERON:0002381",
"develops_from": "UBERON:0000036",
"has_muscle_origin": "UBERON:0001105",
"has_muscle_insertion": "UBERON:0000976",
"innervated_by": "UBERON:0003726"
}
},
"resolved_relationships": {
"leaf term label": "is_a | part_of"
},
Expand Down Expand Up @@ -361,17 +456,28 @@ Omit empty lists/dicts. Do NOT include a `fma_resolutions` key — use `resolved
- Every confirmed match must have both a UBERON definition and Wikipedia/literature evidence.
- Every new term must have at least one real PMID/DOI in `def_xrefs_to_add` or in the existing
`def_xref` input field (ASCTB-TEMP placeholders do not count as real references).
- `resolved_relationships` values must be `"is_a"` or `"part_of"` only.
- `resolved_parents` values must be real UBERON IDs retrieved from OLS4 — never guessed.
- Layers, zones, heads, bellies, parts of named structures → must be `part_of`, never `is_a`.
- For LEAF terms: prefer emitting `leaf_template_rows[label]` with both `is_a` and
`part_of` populated. Look up similar UBERON terms via awk over uberon-edit.obo to
find the right genus class — do NOT just assign the source parent to one column.
- `leaf_template_rows[label].is_a` should be a genus class (e.g. UBERON:0001630 muscle
organ, UBERON:0011906 muscle head), not a regional grouping class.
- `leaf_template_rows[label].part_of` should be the containing structure (parent muscle,
body region, compartment).
- For backward compatibility, `resolved_relationships` (values `"is_a"` or `"part_of"`)
+ `resolved_parents` may still be used; merge will fall back to these if
`leaf_template_rows` is absent.
- All UBERON ID values must be real UBERON IDs retrieved from OLS4 or uberon-edit.obo —
never guessed.
- Layers, zones, heads, bellies, parts of named structures → MUST have `part_of`
populated to the named parent structure.
- Pathological/dysfunctional terms → must appear in `out_of_scope`.
- Non-standard names → must appear in `name_corrections`.
- **For `term_type: "group"` terms**: every term must end up in EITHER
`group_template_rows` (with both `genus` and `location` populated as real UBERON IDs)
OR `manual_curation` (with proposed definition + similar UBERON terms). No group term
should be silently absent from both.
- `resolved_relationships` and `resolved_parents` apply to LEAF terms only — do not emit
these keys for group terms.
- `leaf_template_rows`, `resolved_relationships`, `resolved_parents` apply to LEAF terms
only — do not emit these keys for group terms.
- Do NOT invent UBERON IDs.

## Tools Available
Expand Down
28 changes: 6 additions & 22 deletions .github/workflows/ai-agent.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
# and runs Claude Code to respond.
#
# Required secrets:
# - AI4C_AGENT_APP_ID
# - AI4C_AGENT_PRIVATE_KEY
# - PAT_FOR_PR
# - CLAUDE_CODE_OAUTH_TOKEN
#
# Configuration:
Expand Down Expand Up @@ -56,24 +55,17 @@ jobs:
outputs:
result: ${{ steps.check.outputs.result }}
steps:
- name: Generate ai4c-agent token
id: app-token
uses: actions/create-github-app-token@v3
with:
app-id: ${{ secrets.AI4C_AGENT_APP_ID }}
private-key: ${{ secrets.AI4C_AGENT_PRIVATE_KEY }}

- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 1
token: ${{ steps.app-token.outputs.token }}
token: ${{ secrets.PAT_FOR_PR }}

- name: Check for qualifying mention
id: check
uses: actions/github-script@v8
with:
github-token: ${{ steps.app-token.outputs.token }}
github-token: ${{ secrets.PAT_FOR_PR }}
script: |
const fs = require("fs");

Expand Down Expand Up @@ -250,22 +242,14 @@ jobs:
contents: write
issues: write
pull-requests: write
id-token: write
runs-on: ubuntu-latest
container: ${{ fromJSON(needs.check-mention.outputs.result).useOdkContainer && 'obolibrary/odkfull:v1.6' || null }}
steps:
- name: Generate ai4c-agent token
id: app-token
uses: actions/create-github-app-token@v3
with:
app-id: ${{ secrets.AI4C_AGENT_APP_ID }}
private-key: ${{ secrets.AI4C_AGENT_PRIVATE_KEY }}

- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 1
token: ${{ steps.app-token.outputs.token }}
token: ${{ secrets.PAT_FOR_PR }}

- name: Configure Git
run: |
Expand Down Expand Up @@ -294,13 +278,13 @@ jobs:
echo "${{ github.workspace }}/.venv/bin" >> "$GITHUB_PATH"

- name: Export GitHub token for gh CLI
run: echo "GH_TOKEN=${{ steps.app-token.outputs.token }}" >> "$GITHUB_ENV"
run: echo "GH_TOKEN=${{ secrets.PAT_FOR_PR }}" >> "$GITHUB_ENV"

- name: Run Claude Code
uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
github_token: ${{ steps.app-token.outputs.token }}
github_token: ${{ secrets.PAT_FOR_PR }}
allowed_bots: "claude,github-actions"
show_full_output: true
claude_args: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/diff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ jobs:
ref: ${{ steps.comment-branch.outputs.head_ref }}
- name: Classify ontology PR branch
if: steps.check.outputs.triggered == 'true'
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=false uberon-base.owl > TESTLOG.log
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=true uberon-base.owl > TESTLOG.log
- name: Upload classified ontology in PR branch
if: steps.check.outputs.triggered == 'true'
uses: actions/upload-artifact@v4
Expand All @@ -136,7 +136,7 @@ jobs:
ref: master
- name: Classify ontology main branch
if: steps.check.outputs.triggered == 'true'
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=false uberon-base.owl > TESTLOG.log
run: export ROBOT_JAVA_ARGS='-Xmx9G'; cd src/ontology; make BRI=false MIR=false PAT=false IMP=false COMP=true uberon-base.owl > TESTLOG.log
- name: Upload classified ontology main branch
if: steps.check.outputs.triggered == 'true'
uses: actions/upload-artifact@v4
Expand Down
41 changes: 35 additions & 6 deletions bulk_ntr_workflow/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,12 +185,12 @@ After QC, both templates need to be registered with ODK.
| `bulk_ntr_workflow/outputs/template_groups_initial.tsv` | Groups working copy (EC directives) |
| `src/templates/<name>.template.tsv` | Final leaf template; updated in-place by Stage 4 |
| `src/templates/<name>-groups.template.tsv` | Final groups template (equivalent class definitions) |
| `src/templates/<name>-reports/input.tsv` | Filtered input rows + `term_type` classification |
| `src/templates/<name>-reports/errors.tsv` | Input errors (bad/FMA/ASCTB-TEMP parents) |
| `src/templates/<name>-reports/candidates.tsv` | Pre-mapped + OLS4-confirmed existing terms |
| `src/templates/<name>-reports/out_of_scope.tsv` | Pathological/dysfunctional terms |
| `src/templates/<name>-reports/name_corrections.tsv` | Source-label → corrected-label rewrites |
| `src/templates/<name>-reports/manual_curation.tsv` | Group terms not fitting simple `part_of` pattern |
| `bulk_ntr_workflow/outputs/<name>-reports/input.tsv` | Filtered input rows + `term_type` classification |
| `bulk_ntr_workflow/outputs/<name>-reports/errors.tsv` | Input errors (bad/FMA/ASCTB-TEMP parents) |
| `bulk_ntr_workflow/outputs/<name>-reports/candidates.tsv` | Pre-mapped + OLS4-confirmed existing terms |
| `bulk_ntr_workflow/outputs/<name>-reports/out_of_scope.tsv` | Pathological/dysfunctional terms |
| `bulk_ntr_workflow/outputs/<name>-reports/name_corrections.tsv` | Source-label → corrected-label rewrites |
| `bulk_ntr_workflow/outputs/<name>-reports/manual_curation.tsv` | Group terms not fitting simple `part_of` pattern |
| `bulk_ntr_workflow/outputs/definitions/input/*.json` | Per-group input for subagents |
| `bulk_ntr_workflow/outputs/definitions/*.json` | Per-group subagent output |

Expand All @@ -206,13 +206,42 @@ After QC, both templates need to be registered with ODK.
| def_xref | >A oboInOwl:hasDbXref SPLIT=\| | References + ASCTB-TEMP IRI |
| is_a | SC % | Genus class (structural type or classification parent) |
| part_of | SC BFO:0000050 some % | Containing structure |
| develops_from | SC RO:0002202 some % | Optional. Developmental precursor (stage series) |
| In_subset | AI oboInOwl:inSubset | `added_by_HRA` subset IRI |
| Date | AT dcterms:date^^xsd:dateTime | ISO timestamp |
| Contributor | AI dcterms:contributor | ORCID IRI |
| Present_in_taxon | AI RO:0002175 | NCBITaxon IRI |
| Wikipedia_image | A foaf:depiction | Wikipedia image URL |
| xref | A oboInOwl:hasDbXref SPLIT=\| | Direct term xrefs: Wikipedia article + FMA ID |

### Muscle leaf template (`<name>-muscle.template.tsv`) — Phase 7 overlay

Used automatically when the source `tables` value is `muscular-system`. Adds three
columns between `develops_from` and `In_subset`:

| Header | ROBOT directive | Notes |
|---|---|---|
| has_muscle_origin | SC RO:0002372 some % | Bone/structure the muscle arises from |
| has_muscle_insertion | SC RO:0002373 some % | Bone/structure the muscle inserts onto |
| innervated_by | SC RO:0002005 some % | Motor nerve |

All three are OPTIONAL — empty cell ⇒ no axiom. Populate only when Wikipedia +
UBERON precedent provide a resolvable UBERON ID for the related entity.

### Template variants and partitioning

Stage 1 partitions input rows by source `tables` column → system overlay map:

| Source table value | Overlay | Output template |
|---|---|---|
| `muscular-system` | `muscle` | `<name>-muscle.template.tsv` |
| (anything else) | `default` | `<name>.template.tsv` |

A single Stage 1 run can produce multiple leaf templates if the input has rows from
mixed tables (each system gets its own clean template — no muscle-specific empty
columns appear in non-muscle templates). The routing decision is printed at the start
of Stage 1 as `Step 0 routing: muscle=N, default=M, group=K`.

### Groups template (`<name>-groups.template.tsv`) — equivalent class

Same as leaf, with `is_a` / `part_of` replaced by `genus` / `location`:
Expand Down
Loading
Loading