Add functional annotation extension for PDBe-KB integration#21
Conversation
…tion This commit extends the BioStride schema to support functional and structural annotations from PDBe-KB and other knowledge bases, enabling integration of experimental structural data with computational predictions and literature-derived annotations. ## New Features ### Functional Annotation Classes - **ProteinAnnotation**: Base class for all protein-related annotations - **FunctionalSite**: Catalytic sites, binding sites, regulatory regions - **StructuralFeature**: Secondary structure, domains, disorder regions - **LigandInteraction**: Small molecule binding and druggability data - **ProteinProteinInteraction**: Macromolecular complex interfaces - **MutationEffect**: Disease-associated variants and stability effects - **PostTranslationalModification**: PTM annotations and regulatory effects - **BiophysicalProperty**: Experimental and predicted properties - **ConformationalEnsemble**: Dynamic states and transitions - **EvolutionaryConservation**: Conservation scores and coevolution - **AggregatedProteinView**: Complete protein knowledge profiles ### Integration Points - Extended Sample class with functional annotation fields - Added aggregated protein views to Study class - Comprehensive enumerations for controlled vocabularies - Database cross-references to UniProt, PDB, Pfam, COSMIC, etc. ### Example Data - Sample with inline functional annotations (ATP synthase) - Aggregated protein view (p53 tumor suppressor) - Study combining structural and functional data ### Schema Validation - All new classes follow LinkML best practices - Comprehensive validation with required fields and constraints - Compatible with existing BioStride data models This extension maintains BioStride's focus on experimental data while adding rich contextual annotations that place structures in their biological context, supporting integrative structural biology workflows. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @cmungall's task —— View job PR Review: Functional Annotation ExtensionI've completed a comprehensive review of this functional annotation extension for PDBe-KB integration. This is an impressive addition that significantly extends BioStride's capabilities. ✅ StrengthsSchema Design Excellence
Integration Design
Data Model Quality
|
- Fix CURIE prefixes to use 'biostride:' namespace - Correct sequence format in MolecularComposition - Add inlined directives for complex objects in Sample schema 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @cmungall's task —— View job PR Review: Functional Annotation ExtensionI've completed a comprehensive review of this functional annotation extension for PDBe-KB integration. This is an impressive addition that significantly extends BioStride's capabilities while introducing several critical issues that need addressing. ✅ StrengthsSchema Design Excellence
Integration Design
Data Model Quality
|
- Add missing enum values (zinc_binding, disordered, intact, cosmic, clinvar) - Fix all CURIE prefixes to use 'biostride:' namespace - Add missing IDs to all annotation objects - Fix invalid enum values in examples - Update inlined directives for complex objects Main examples now validate successfully: - Sample-with-functional-annotations.yaml ✅ - AggregatedProteinView-example.yaml ✅ - Dataset-loosenin-bioenergy.yaml ✅ 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ions This commit extends the schema with detailed metadata for protein expression, purification, and X-ray crystallography workflows: - Add ProteinConstruct class for cloning and construct design details - Extend SamplePreparation with expression parameters (host, media, induction) - Add purification workflow details (affinity, IEX, SEC, yields, purity) - Enhance XRayPreparation with crystallization optimization and crystal handling - Add detailed X-ray data collection parameters (wavelength, detector, beam) - Expand QualityMetrics with crystallographic metrics (unit cell, R-factors, validation scores) - Extend WorkflowRun with phasing and refinement parameters - Add new enums: ExpressionSystemEnum, PurificationStepEnum, PhasingMethodEnum, ExperimentalMethodEnum These additions support comprehensive documentation of protein production pipelines and crystallographic structure determination workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @cmungall's task —— View job PR Review: Functional Annotation Extension for PDBe-KB IntegrationI've completed a comprehensive review of this functional annotation extension for PDBe-KB integration. This is an impressive and substantial addition to BioStride's capabilities, but there are several critical issues that need to be addressed. ✅ StrengthsSchema Design Excellence
Integration Design
Data Model Quality
|
Summary
This PR extends the BioStride schema to support comprehensive functional and structural annotations from PDBe-KB and other knowledge bases. The extension enables integration of experimental structural data with computational predictions, evolutionary information, and literature-derived annotations.
Key Features
🧬 New Annotation Classes
🔗 Integration Points
Sampleclass with functional annotation fieldsaggregated_protein_viewstoStudyclass for knowledge aggregation📊 Example Data
✅ Schema Validation
Use Cases Enabled
Technical Details
src/biostride/schema/functional_annotation.yaml(15 classes, 20+ enums)This extension maintains BioStride's experimental focus while adding rich biological context from the broader structural biology ecosystem.
Test plan
🤖 Generated with Claude Code