Problem
While BioStride mentions ontology integration, the schema uses mostly custom enums and free-text fields where community ontologies could provide greater semantic precision. There are no explicit mappings to OBI, UBERON, or other standards.
Current State
- Empty mappings: Schema classes show empty "Mappings" sections or just self-references
- Unmapped enumerations: TechniqueEnum, PreparationTypeEnum have no ontology CURIEs
- Missing OBI alignment: ExperimentRun, SamplePreparation could map to OBI assay/process terms
- No technique mappings: SAXS, cryo-EM could reference EDAM or OBI technique terms
- Undefined quality metrics: Resolution, Rg, completeness lack standardized definitions
Examples of missing mappings:
- ExperimentRun → could map to OBI "data acquisition"
- SamplePreparation → could map to OBI sample preparation processes
- Instrument → could map to OBI "instrument device" or BFO "device"
- sample_type enum values → could map to OBI or Sample Ontology terms
Impact
- Poor interoperability: Cannot easily integrate with other ontology-annotated datasets
- Semantic isolation: Creates mini-ontology instead of reusing community standards
- Query limitations: Cannot leverage semantic web tools for BioStride data
- Definition gaps: Some classes lack descriptions that could come from ontologies
Suggested Solutions
1. Add ontology mappings to schema classes
Use LinkML mappings slot:
- ExperimentRun → OBI:0001911 (if relevant for data acquisition)
- SamplePreparation → appropriate OBI process term
- Instrument → BFO "device" or OBI instrument term
2. Embed ontology identifiers in enumerations
Add meaning field to enum values:
- TechniqueEnum.cryo_em → EDAM:operation_364 "cryo EM"
- TechniqueEnum.saxs → EDAM:operation_3450 (or similar)
- Currently these are "None" in docs
3. Expand OntologyTerm usage
Replace hardcoded enums with OntologyTerm references:
- sample_type → OntologyTerm (referencing CLO, Sample Ontology, etc.)
- experiment_type → OntologyTerm for experiment classification
- Follow the existing ImageFeature.terms pattern
4. Link quality metrics to standards
Map QualityMetrics fields (resolution, completeness) to:
- mmCIF/PDBx metadata standards
- Small Angle Scattering community terms
- PDBe metadata definitions
Target Ontologies
- OBI (Ontology for Biomedical Investigations): 2,500+ terms for experiments, assays, devices
- EDAM (bioinformatics operations): Technique and data type terms
- UBERON: Anatomy terms (already mentioned for biological context)
- BFO (Basic Formal Ontology): Upper-level terms for devices, processes
- mmCIF/PDBx: Structural biology metadata standards
Benefits
- Improved interoperability with existing datasets
- Semantic web compatibility for advanced queries
- Reuse of vetted definitions instead of reinventing terms
- Future-proofing through standards alignment
Priority
High - Critical for community adoption and semantic interoperability.
Problem
While BioStride mentions ontology integration, the schema uses mostly custom enums and free-text fields where community ontologies could provide greater semantic precision. There are no explicit mappings to OBI, UBERON, or other standards.
Current State
Examples of missing mappings:
Impact
Suggested Solutions
1. Add ontology mappings to schema classes
Use LinkML mappings slot:
2. Embed ontology identifiers in enumerations
Add meaning field to enum values:
3. Expand OntologyTerm usage
Replace hardcoded enums with OntologyTerm references:
4. Link quality metrics to standards
Map QualityMetrics fields (resolution, completeness) to:
Target Ontologies
Benefits
Priority
High - Critical for community adoption and semantic interoperability.