Skip to content

aehrc/MedGrounder

Repository files navigation

Generalised Medical Phrase Grounding (GPMG)

This repository contains the code and resources for the paper "Generalised Medical Phrase Grounding". It introduces the Generalised Medical Phrase Grounding (GMPG) task, which maps textual descriptions of radiological findings to zero, one, or multiple image regions, addressing the limitations of traditional single-box grounding. The repository provides the MedGrounder model, which achieves state-of-the-art performance on this task.

📂 Repository Structure

  • conf/: Configuration files for datasets, model parameters, and evaluation settings.

    • base.yaml: Base configuration containing common model and training parameters.
    • datasets.yaml: Dataset paths and settings (user-specific).
    • conf_pretrain_imagenome.yaml: Configuration for evaluating the model pretrained on Chest ImaGenome (Stage 1). Use this to reproduce zero-shot results.
    • conf_finetune_ms.yaml: Configuration for the model fine-tuned on MS-CXR. This model achieves the best performance on MS-CXR (Stage 2).
    • conf_finetune_mspc.yaml: Configuration for the model fine-tuned on both MS-CXR and PadChest-GR. This is the best performing model on PadChest-GR and serves as a robust generalist (Stage 2).
    • conf_finetune_pc.yaml: Configuration for the model fine-tuned on PadChest-GR only.
  • dataloaders/: PyTorch dataset implementations for MS-CXR, PadChest, and MIMIC-CXR.

  • model/: MedGrounder model architecture and backbone definitions.

  • model_weight/: Pretrained and fine-tuned model weights.

    • medgrounder_pretrain_imagenome.pth: Weights after Stage 1 pretraining on Chest ImaGenome (Weak Supervision).
    • medgrounder_finetune_ms.pth: Weights after Stage 2 fine-tuning on MS-CXR.
    • medgrounder_finetune_mspc.pth: Weights after Stage 2 fine-tuning on combined MS-CXR and PadChest-GR.
    • medgrounder_finetune_pc.pth: Weights after Stage 2 fine-tuning on PadChest-GR.

    Note: Model weights can be downloaded from here. Please place them in the model_weight/ directory.

  • utils/: Utility functions for bounding box operations, distributed training, and metrics.

  • test_examples/: Sample images and metadata (test_data.json) for inference testing.

  • evaluation.py: Script for evaluating the model on test datasets.

  • inference.ipynb: Jupyter notebook for interactive inference and visualization.

  • gmpg_metrics.py: Custom metrics for phrase grounding evaluation (IoU, F1, etc.).

🚀 Installation

  1. Clone the repository:

    git clone https://github.com/Claire1217/GMPG.git
    cd gmpg
  2. Set up the environment: Ensure you have a Python environment (e.g., Conda) with the necessary dependencies.

    # Example using conda
    conda create -n gmpg_env python=3.11
    conda activate gmpg_env
    pip install -r requirements.txt
  3. Set PYTHONPATH: Add the src directory to your PYTHONPATH to ensure imports work correctly.

    export PYTHONPATH=$PYTHONPATH:/path/to/gmpg/src

📊 Data Preparation

Configure your dataset paths in conf/datasets.yaml. You need to specify the paths to the annotation CSV files and the image root directories for:

  • MS-CXR (gmpg_mscxr)
  • PadChest (gmpg_padchestgr)

Example conf/datasets.yaml:

gmpg_mscxr_args:
  mscxr_annotation_file: /path/to/MS_CXR_Local_Alignment_v1.1.0.csv
  mscxr_image_root: /path/to/mimic-cxr-jpg/2.0.0

gmpg_padchestgr_args:
  padchest_annotation_file: /path/to/flattened_findings.csv
  padchest_meta_file: /path/to/master_table.csv
  padchest_image_root: /path/to/PadChest_GR

Preparing PadChest-GR Data

Generate the flattened_findings.csv for PadChest-GR from the raw JSON and master table, use the provided script:

python prepare_padchest_csv.py \
  --json_path /path/to/grounded_reports_20240819.json \
  --meta_path /path/to/master_table.csv \
  --output_path /path/to/flattened_findings.csv

🧪 Inference

Interactive Notebook

Use inference.ipynb to run inference on selected samples and visualize the results.

  1. Open inference.ipynb in Jupyter.
  2. Ensure test_examples/test_data.json contains the samples you want to test.
  3. Run the cells to load the model, process images, and display predicted bounding boxes (red) alongside ground truth (green).

📈 Evaluation

To evaluate the model on the test datasets and compute metrics (IoU, F1, Center-Hit Precision/Recall), use evaluation.py.

python evaluation.py --config conf/conf_finetune_mspc.yaml

Key Arguments:

  • --config: Path to the run configuration file (default: conf/conf_finetune_mspc.yaml).
  • --resume: Path to the model checkpoint (optional, overrides config).
  • --dataset: Specific dataset to evaluate on (e.g., gmpg_mscxr).
  • --device: Device to use (cuda or cpu).

Metrics: The evaluation script reports:

  • Negative accuracy (n_acc): Percentage of non-groundable phrases correctly predicted with no box
  • Precision@F1=1: Exact match metric requiring all boxes to be correctly predicted
  • Center-Hit F1: Relaxed metric requiring predicted box centers to fall inside ground-truth regions
  • Mask IoU: Intersection over union of pixel masks
  • Mask IoU Accuracy: Percentage of predictions with IoU ≥ 0.5

Evaluation Results: Results are automatically saved to evaluation_results_{dataset}_{timestamp}.json for each evaluated dataset. Each file contains:

  • All performance metrics
  • Dataset name
  • Config file path
  • Checkpoint path (model weights used)
  • Timestamp

Note on N-Acc: MS-CXR test set does not contain non-groundable phrases, so N-Acc will be 0% (as expected). This is not an error.

🛠️ Post-Processing

The pipeline supports Weighted Box Fusion (WBF) for combining overlapping bounding boxes. This is configured in conf/base.yaml:

test_post_processing: True
test_post_processing_params:
  run_wbf: True
  wbf_iou_threshold: 0.1
  skip_box_thr: 0.0

🤝 Contributing

If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors