This repository contains the code and resources for the paper "Generalised Medical Phrase Grounding". It introduces the Generalised Medical Phrase Grounding (GMPG) task, which maps textual descriptions of radiological findings to zero, one, or multiple image regions, addressing the limitations of traditional single-box grounding. The repository provides the MedGrounder model, which achieves state-of-the-art performance on this task.
-
conf/: Configuration files for datasets, model parameters, and evaluation settings.base.yaml: Base configuration containing common model and training parameters.datasets.yaml: Dataset paths and settings (user-specific).conf_pretrain_imagenome.yaml: Configuration for evaluating the model pretrained on Chest ImaGenome (Stage 1). Use this to reproduce zero-shot results.conf_finetune_ms.yaml: Configuration for the model fine-tuned on MS-CXR. This model achieves the best performance on MS-CXR (Stage 2).conf_finetune_mspc.yaml: Configuration for the model fine-tuned on both MS-CXR and PadChest-GR. This is the best performing model on PadChest-GR and serves as a robust generalist (Stage 2).conf_finetune_pc.yaml: Configuration for the model fine-tuned on PadChest-GR only.
-
dataloaders/: PyTorch dataset implementations for MS-CXR, PadChest, and MIMIC-CXR. -
model/: MedGrounder model architecture and backbone definitions. -
model_weight/: Pretrained and fine-tuned model weights.medgrounder_pretrain_imagenome.pth: Weights after Stage 1 pretraining on Chest ImaGenome (Weak Supervision).medgrounder_finetune_ms.pth: Weights after Stage 2 fine-tuning on MS-CXR.medgrounder_finetune_mspc.pth: Weights after Stage 2 fine-tuning on combined MS-CXR and PadChest-GR.medgrounder_finetune_pc.pth: Weights after Stage 2 fine-tuning on PadChest-GR.
Note: Model weights can be downloaded from here. Please place them in the
model_weight/directory. -
utils/: Utility functions for bounding box operations, distributed training, and metrics. -
test_examples/: Sample images and metadata (test_data.json) for inference testing. -
evaluation.py: Script for evaluating the model on test datasets. -
inference.ipynb: Jupyter notebook for interactive inference and visualization. -
gmpg_metrics.py: Custom metrics for phrase grounding evaluation (IoU, F1, etc.).
-
Clone the repository:
git clone https://github.com/Claire1217/GMPG.git cd gmpg -
Set up the environment: Ensure you have a Python environment (e.g., Conda) with the necessary dependencies.
# Example using conda conda create -n gmpg_env python=3.11 conda activate gmpg_env pip install -r requirements.txt -
Set PYTHONPATH: Add the
srcdirectory to your PYTHONPATH to ensure imports work correctly.export PYTHONPATH=$PYTHONPATH:/path/to/gmpg/src
Configure your dataset paths in conf/datasets.yaml. You need to specify the paths to the annotation CSV files and the image root directories for:
- MS-CXR (
gmpg_mscxr) - PadChest (
gmpg_padchestgr)
Example conf/datasets.yaml:
gmpg_mscxr_args:
mscxr_annotation_file: /path/to/MS_CXR_Local_Alignment_v1.1.0.csv
mscxr_image_root: /path/to/mimic-cxr-jpg/2.0.0
gmpg_padchestgr_args:
padchest_annotation_file: /path/to/flattened_findings.csv
padchest_meta_file: /path/to/master_table.csv
padchest_image_root: /path/to/PadChest_GRGenerate the flattened_findings.csv for PadChest-GR from the raw JSON and master table, use the provided script:
python prepare_padchest_csv.py \
--json_path /path/to/grounded_reports_20240819.json \
--meta_path /path/to/master_table.csv \
--output_path /path/to/flattened_findings.csvUse inference.ipynb to run inference on selected samples and visualize the results.
- Open
inference.ipynbin Jupyter. - Ensure
test_examples/test_data.jsoncontains the samples you want to test. - Run the cells to load the model, process images, and display predicted bounding boxes (red) alongside ground truth (green).
To evaluate the model on the test datasets and compute metrics (IoU, F1, Center-Hit Precision/Recall), use evaluation.py.
python evaluation.py --config conf/conf_finetune_mspc.yamlKey Arguments:
--config: Path to the run configuration file (default:conf/conf_finetune_mspc.yaml).--resume: Path to the model checkpoint (optional, overrides config).--dataset: Specific dataset to evaluate on (e.g.,gmpg_mscxr).--device: Device to use (cudaorcpu).
Metrics: The evaluation script reports:
- Negative accuracy (n_acc): Percentage of non-groundable phrases correctly predicted with no box
- Precision@F1=1: Exact match metric requiring all boxes to be correctly predicted
- Center-Hit F1: Relaxed metric requiring predicted box centers to fall inside ground-truth regions
- Mask IoU: Intersection over union of pixel masks
- Mask IoU Accuracy: Percentage of predictions with IoU ≥ 0.5
Evaluation Results:
Results are automatically saved to evaluation_results_{dataset}_{timestamp}.json for each evaluated dataset. Each file contains:
- All performance metrics
- Dataset name
- Config file path
- Checkpoint path (model weights used)
- Timestamp
Note on N-Acc: MS-CXR test set does not contain non-groundable phrases, so N-Acc will be 0% (as expected). This is not an error.
The pipeline supports Weighted Box Fusion (WBF) for combining overlapping bounding boxes. This is configured in conf/base.yaml:
test_post_processing: True
test_post_processing_params:
run_wbf: True
wbf_iou_threshold: 0.1
skip_box_thr: 0.0If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.