Table of Contents
Fetching ...

Anatomical grounding pre-training for medical phrase grounding

Wenjun Zhang, Shakes Chandra, Aaron Nicolson

TL;DR

This work targets Medical Phrase Grounding (MPG), where limited annotated data and domain shift impede progress. It introduces anatomical grounding as an in-domain pre-training task, leveraging Chest ImaGenome to align anatomical terms with image regions, then fine-tunes on MS-CXR to achieve state-of-the-art MPG performance (mIoU = 61.2) after adaptation. The approach, including LoRA stabilization for full-model training and GPT-4–based synonyms augmentation, yields substantial zero-shot and fine-tuning gains and outperforms several existing MPG baselines. Overall, anatomy-informed pre-training emerges as a practical, scalable solution to enhance MPG in data-scarce medical imaging settings.

Abstract

Medical Phrase Grounding (MPG) maps radiological findings described in medical reports to specific regions in medical images. The primary obstacle hindering progress in MPG is the scarcity of annotated data available for training and validation. We propose anatomical grounding as an in-domain pre-training task that aligns anatomical terms with corresponding regions in medical images, leveraging large-scale datasets such as Chest ImaGenome. Our empirical evaluation on MS-CXR demonstrates that anatomical grounding pre-training significantly improves performance in both a zero-shot learning and fine-tuning setting, outperforming state-of-the-art MPG models. Our fine-tuned model achieved state-of-the-art performance on MS-CXR with an mIoU of 61.2, demonstrating the effectiveness of anatomical grounding pre-training for MPG.

Anatomical grounding pre-training for medical phrase grounding

TL;DR

This work targets Medical Phrase Grounding (MPG), where limited annotated data and domain shift impede progress. It introduces anatomical grounding as an in-domain pre-training task, leveraging Chest ImaGenome to align anatomical terms with image regions, then fine-tunes on MS-CXR to achieve state-of-the-art MPG performance (mIoU = 61.2) after adaptation. The approach, including LoRA stabilization for full-model training and GPT-4–based synonyms augmentation, yields substantial zero-shot and fine-tuning gains and outperforms several existing MPG baselines. Overall, anatomy-informed pre-training emerges as a practical, scalable solution to enhance MPG in data-scarce medical imaging settings.

Abstract

Medical Phrase Grounding (MPG) maps radiological findings described in medical reports to specific regions in medical images. The primary obstacle hindering progress in MPG is the scarcity of annotated data available for training and validation. We propose anatomical grounding as an in-domain pre-training task that aligns anatomical terms with corresponding regions in medical images, leveraging large-scale datasets such as Chest ImaGenome. Our empirical evaluation on MS-CXR demonstrates that anatomical grounding pre-training significantly improves performance in both a zero-shot learning and fine-tuning setting, outperforming state-of-the-art MPG models. Our fine-tuned model achieved state-of-the-art performance on MS-CXR with an mIoU of 61.2, demonstrating the effectiveness of anatomical grounding pre-training for MPG.

Paper Structure

This paper contains 14 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Anatomical grounding as an in-domain pre-training task for Medical Phrase Grounding (MPG).
  • Figure 2: MPG with and without anatomical grounding pre-training (AGPT). The top example contains the anatomical region within the text, whereas the bottom example does not. Blue and red boxes indicate the ground-truth and predicted bounding boxes, respectively.