Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

Yunhe Gao; Zhuowei Li; Di Liu; Mu Zhou; Shaoting Zhang; Dimitris N. Metaxas

Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

Yunhe Gao, Zhuowei Li, Di Liu, Mu Zhou, Shaoting Zhang, Dimitris N. Metaxas

TL;DR

The paper addresses the bottleneck of task-specific medical image segmentation by proposing a universal paradigm and Hermes, a context-prior learning framework that injects learned task and modality priors into a segmentation backbone. Hermes jointly trains across diverse datasets to build robust, transferable representations, using an oracle-guided fusion of task and modality priors and hierarchical multi-scale modeling. Across 11 upstream datasets and two downstream tasks, Hermes delivers state-of-the-art performance, demonstrates strong transfer and incremental learning capabilities, and reveals priors that reflect anatomical and imaging principles. This approach offers a scalable, data-efficient path toward foundational models in medical imaging with practical implications for multi-task clinical segmentation and generalization across modalities and body regions.

Abstract

A major focus of clinical imaging workflow is disease diagnosis and management, leading to medical imaging datasets strongly tied to specific clinical objectives. This scenario has led to the prevailing practice of developing task-specific segmentation models, without gaining insights from widespread imaging cohorts. Inspired by the training program of medical radiology residents, we propose a shift towards universal medical image segmentation, a paradigm aiming to build medical image understanding foundation models by leveraging the diversity and commonality across clinical targets, body regions, and imaging modalities. Towards this goal, we develop Hermes, a novel context-prior learning approach to address the challenges of data heterogeneity and annotation differences in medical image segmentation. In a large collection of eleven diverse datasets (2,438 3D images) across five modalities (CT, PET, T1, T2 and cine MRI) and multiple body regions, we demonstrate the merit of the universal paradigm over the traditional paradigm on addressing multiple tasks within a single model. By exploiting the synergy across tasks, Hermes achieves state-of-the-art performance on all testing datasets and shows superior model scalability. Results on two additional datasets reveals Hermes' strong performance for transfer learning, incremental learning, and generalization to downstream tasks. Hermes's learned priors demonstrate an appealing trait to reflect the intricate relations among tasks and modalities, which aligns with the established anatomical and imaging principles in radiology. The code is available: https://github.com/yhygao/universal-medical-image-segmentation.

Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 15 sections, 6 equations, 8 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Method
Preliminary
Oracle-guided context-prior learning
Results
Experiments setup
Results
Downstream tasks
Analysis
Discussion and Conclusion
Supplementary
Prior fusion module details
Supplement Experiments
Dataset details

Figures (8)

Figure 1: A) Clinical diagnostic workflows typically focus on specific specialties and diseases, leading to the curation of image datasets that are partially annotated, multi-modal, and multi-regional. B) Traditional training paradigms involve training separate models for each segmentation task (or dataset). In contrast, we emphasize a universal medical image segmentation paradigm aiming at one model for all, leading to a robust and generalizable universal model for diverse tasks.
Figure 2: Illustration of Hermes. A context-prior knowledge pool, including task and modality priors, is learned with the backbone. Through oracle-guided selection and combination of these priors, Hermes can address a variety of segmentation tasks and image modalities.
Figure 3: (A) Comparison with other SOTA methods. ROIs with Dice scores lower than 80 under the traditional paradigm are defined as 'difficult classes'. (B) Model scalability analysis. We scale ResUNet down and up to three variants: ResUNet-Small (10.1M), ResUNet-Base (40.6M), and ResUNet-Large (157.9M), and the same for Hermes. All other experiments use ResUNet-Base as the backbone unless specified. (C) Generalization from StructSeg to SegTHOR.
Figure 4: Upper: Cosine similarity of Hermes's learned task priors and CLIP's task embeddings. Lower: Example structures that have high similarity. Hermes's priors are learned directly from medical data and are able to capture intricate relationships among tasks, while CLIP's embeddings tend to encode all objects into similar embeddings, resulting in a loss of discriminative details.
Figure 5: Cosine similarity of Hermes's learned modality priors and CLIP's modality embeddings. The learned modality prior knowledge of Hermes is consistent with imaging principles.
...and 3 more figures

Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

TL;DR

Abstract

Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)