Table of Contents
Fetching ...

A Classifier-Free Incremental Learning Framework for Scalable Medical Image Segmentation

Xiaoyang Chen, Hao Zheng, Yifang Xie, Yuncong Ma, Tengfei Li

TL;DR

This work tackles the scalability problem in medical image segmentation foundation models by proposing a classifier-free segmentation framework that supports a variable number of classes within a single network and can learn incrementally from non-stationary data. It combines contrastive learning with class prototypes derived from CLIP embeddings, enhanced by Gram-Schmidt orthogonalization for interpretability, and integrates knowledge distillation to enable continual learning without revisiting old data. The method shows superior performance against DoDNet, CLIP-driven, and Versatile baselines on incompletely annotated, multi-modal datasets and demonstrates robust class- and domain-incremental learning in rehearsal-free settings when unlabeled data from prior domains are available. This approach addresses key scalability bottlenecks—annotation cost, architectural inflexibility, and dynamic real-world data—bringing practical foundation-model capabilities to automated medical image segmentation. The results indicate strong potential for deploying scalable, continual segmentation systems across diverse clinical datasets and modalities.

Abstract

Current methods for developing foundation models in medical image segmentation rely on two primary assumptions: a fixed set of classes and the immediate availability of a substantial and diverse training dataset. However, this can be impractical due to the evolving nature of imaging technology and patient demographics, as well as labor-intensive data curation, limiting their practical applicability and scalability. To address these challenges, we introduce a novel segmentation paradigm enabling the segmentation of a variable number of classes within a single classifier-free network, featuring an architecture independent of class number. This network is trained using contrastive learning and produces discriminative feature representations that facilitate straightforward interpretation. Additionally, we integrate this strategy into a knowledge distillation-based incremental learning framework, facilitating the gradual assimilation of new information from non-stationary data streams while avoiding catastrophic forgetting. Our approach provides a unified solution for tackling both class- and domain-incremental learning scenarios. We demonstrate the flexibility of our method in handling varying class numbers within a unified network and its capacity for incremental learning. Experimental results on an incompletely annotated, multi-modal, multi-source dataset for medical image segmentation underscore its superiority over state-of-the-art alternative approaches.

A Classifier-Free Incremental Learning Framework for Scalable Medical Image Segmentation

TL;DR

This work tackles the scalability problem in medical image segmentation foundation models by proposing a classifier-free segmentation framework that supports a variable number of classes within a single network and can learn incrementally from non-stationary data. It combines contrastive learning with class prototypes derived from CLIP embeddings, enhanced by Gram-Schmidt orthogonalization for interpretability, and integrates knowledge distillation to enable continual learning without revisiting old data. The method shows superior performance against DoDNet, CLIP-driven, and Versatile baselines on incompletely annotated, multi-modal datasets and demonstrates robust class- and domain-incremental learning in rehearsal-free settings when unlabeled data from prior domains are available. This approach addresses key scalability bottlenecks—annotation cost, architectural inflexibility, and dynamic real-world data—bringing practical foundation-model capabilities to automated medical image segmentation. The results indicate strong potential for deploying scalable, continual segmentation systems across diverse clinical datasets and modalities.

Abstract

Current methods for developing foundation models in medical image segmentation rely on two primary assumptions: a fixed set of classes and the immediate availability of a substantial and diverse training dataset. However, this can be impractical due to the evolving nature of imaging technology and patient demographics, as well as labor-intensive data curation, limiting their practical applicability and scalability. To address these challenges, we introduce a novel segmentation paradigm enabling the segmentation of a variable number of classes within a single classifier-free network, featuring an architecture independent of class number. This network is trained using contrastive learning and produces discriminative feature representations that facilitate straightforward interpretation. Additionally, we integrate this strategy into a knowledge distillation-based incremental learning framework, facilitating the gradual assimilation of new information from non-stationary data streams while avoiding catastrophic forgetting. Our approach provides a unified solution for tackling both class- and domain-incremental learning scenarios. We demonstrate the flexibility of our method in handling varying class numbers within a unified network and its capacity for incremental learning. Experimental results on an incompletely annotated, multi-modal, multi-source dataset for medical image segmentation underscore its superiority over state-of-the-art alternative approaches.
Paper Structure (24 sections, 10 equations, 3 figures, 11 tables)

This paper contains 24 sections, 10 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Overview of our approach. It incorporates a classifier-free, teacher-student architecture for incrementally learn from data streams. Only the teacher model is involved and trained in the initial stage. In the incremental stage, both models are involved, but the teacher model is frozen.
  • Figure 2: Visual comparison between the ground truth and the predictions generated by DoDNet, CLIP-driven, Versatile model and the proposed method on three subjects from different datasets.
  • Figure 3: Similarity map calculated as the cosine similarity between the model's predicted feature representations and the right kidney's class prototype.