A Classifier-Free Incremental Learning Framework for Scalable Medical Image Segmentation
Xiaoyang Chen, Hao Zheng, Yifang Xie, Yuncong Ma, Tengfei Li
TL;DR
This work tackles the scalability problem in medical image segmentation foundation models by proposing a classifier-free segmentation framework that supports a variable number of classes within a single network and can learn incrementally from non-stationary data. It combines contrastive learning with class prototypes derived from CLIP embeddings, enhanced by Gram-Schmidt orthogonalization for interpretability, and integrates knowledge distillation to enable continual learning without revisiting old data. The method shows superior performance against DoDNet, CLIP-driven, and Versatile baselines on incompletely annotated, multi-modal datasets and demonstrates robust class- and domain-incremental learning in rehearsal-free settings when unlabeled data from prior domains are available. This approach addresses key scalability bottlenecks—annotation cost, architectural inflexibility, and dynamic real-world data—bringing practical foundation-model capabilities to automated medical image segmentation. The results indicate strong potential for deploying scalable, continual segmentation systems across diverse clinical datasets and modalities.
Abstract
Current methods for developing foundation models in medical image segmentation rely on two primary assumptions: a fixed set of classes and the immediate availability of a substantial and diverse training dataset. However, this can be impractical due to the evolving nature of imaging technology and patient demographics, as well as labor-intensive data curation, limiting their practical applicability and scalability. To address these challenges, we introduce a novel segmentation paradigm enabling the segmentation of a variable number of classes within a single classifier-free network, featuring an architecture independent of class number. This network is trained using contrastive learning and produces discriminative feature representations that facilitate straightforward interpretation. Additionally, we integrate this strategy into a knowledge distillation-based incremental learning framework, facilitating the gradual assimilation of new information from non-stationary data streams while avoiding catastrophic forgetting. Our approach provides a unified solution for tackling both class- and domain-incremental learning scenarios. We demonstrate the flexibility of our method in handling varying class numbers within a unified network and its capacity for incremental learning. Experimental results on an incompletely annotated, multi-modal, multi-source dataset for medical image segmentation underscore its superiority over state-of-the-art alternative approaches.
