Table of Contents
Fetching ...

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning

Jin Yang, Daniel S. Marcus, Aristeidis Sotiras

TL;DR

This work addresses adapting medical vision foundation models to target volumetric segmentation tasks under source-free constraints. It introduces ASSFT, combining an Active Test Time Sample Query with DKD and ASD metrics and a selective semi-supervised fine-tuning strategy to maximize performance with minimal labeled data. Extensive experiments across five abdominal segmentation domains demonstrate consistent, significant gains over state-of-the-art AL and ADA approaches, often approaching upper-bound performance with modest annotation budgets. The approach offers a practical, privacy-conscious pathway for deploying foundation-model–based segmentation in diverse clinical settings, with clear ablation evidence for the value of each component.

Abstract

Medical Vision Foundation Models (Med-VFMs) have superior capabilities of interpreting medical images due to the knowledge learned from self-supervised pre-training with extensive unannotated images. To improve their performance on adaptive downstream evaluations, especially segmentation, a few samples from target domains are selected randomly for fine-tuning them. However, there lacks works to explore the way of adapting Med-VFMs to achieve the optimal performance on target domains efficiently. Thus, it is highly demanded to design an efficient way of fine-tuning Med-VFMs by selecting informative samples to maximize their adaptation performance on target domains. To achieve this, we propose an Active Source-Free Domain Adaptation (ASFDA) method to efficiently adapt Med-VFMs to target domains for volumetric medical image segmentation. This ASFDA employs a novel Active Learning (AL) method to select the most informative samples from target domains for fine-tuning Med-VFMs without the access to source pre-training samples, thus maximizing their performance with the minimal selection budget. In this AL method, we design an Active Test Time Sample Query strategy to select samples from the target domains via two query metrics, including Diversified Knowledge Divergence (DKD) and Anatomical Segmentation Difficulty (ASD). DKD is designed to measure the source-target knowledge gap and intra-domain diversity. It utilizes the knowledge of pre-training to guide the querying of source-dissimilar and semantic-diverse samples from the target domains. ASD is designed to evaluate the difficulty in segmentation of anatomical structures by measuring predictive entropy from foreground regions adaptively. Additionally, our ASFDA method employs a Selective Semi-supervised Fine-tuning to improve the performance and efficiency of fine-tuning by identifying samples with high reliability from unqueried ones.

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning

TL;DR

This work addresses adapting medical vision foundation models to target volumetric segmentation tasks under source-free constraints. It introduces ASSFT, combining an Active Test Time Sample Query with DKD and ASD metrics and a selective semi-supervised fine-tuning strategy to maximize performance with minimal labeled data. Extensive experiments across five abdominal segmentation domains demonstrate consistent, significant gains over state-of-the-art AL and ADA approaches, often approaching upper-bound performance with modest annotation budgets. The approach offers a practical, privacy-conscious pathway for deploying foundation-model–based segmentation in diverse clinical settings, with clear ablation evidence for the value of each component.

Abstract

Medical Vision Foundation Models (Med-VFMs) have superior capabilities of interpreting medical images due to the knowledge learned from self-supervised pre-training with extensive unannotated images. To improve their performance on adaptive downstream evaluations, especially segmentation, a few samples from target domains are selected randomly for fine-tuning them. However, there lacks works to explore the way of adapting Med-VFMs to achieve the optimal performance on target domains efficiently. Thus, it is highly demanded to design an efficient way of fine-tuning Med-VFMs by selecting informative samples to maximize their adaptation performance on target domains. To achieve this, we propose an Active Source-Free Domain Adaptation (ASFDA) method to efficiently adapt Med-VFMs to target domains for volumetric medical image segmentation. This ASFDA employs a novel Active Learning (AL) method to select the most informative samples from target domains for fine-tuning Med-VFMs without the access to source pre-training samples, thus maximizing their performance with the minimal selection budget. In this AL method, we design an Active Test Time Sample Query strategy to select samples from the target domains via two query metrics, including Diversified Knowledge Divergence (DKD) and Anatomical Segmentation Difficulty (ASD). DKD is designed to measure the source-target knowledge gap and intra-domain diversity. It utilizes the knowledge of pre-training to guide the querying of source-dissimilar and semantic-diverse samples from the target domains. ASD is designed to evaluate the difficulty in segmentation of anatomical structures by measuring predictive entropy from foreground regions adaptively. Additionally, our ASFDA method employs a Selective Semi-supervised Fine-tuning to improve the performance and efficiency of fine-tuning by identifying samples with high reliability from unqueried ones.

Paper Structure

This paper contains 29 sections, 20 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Active Selective Semi-supervised Fine-tuning (ASSFT) of medical vision foundation models for volumetric medical image segmentation. The segmentation network was pre-trained on the source data $\mathbb{S}=\{\boldsymbol{X}_s\}$ and adapted to the target domain $\mathbb{T}$ for downstream evaluation. ASSFT employs an Active Test Time Sample Query strategy to evaluate the information level of each target sample. This strategy employs two metrics: Diversified Knowledge Divergence (DKD) and Anatomical Segmentation Difficulty (ASD). The scores of these two metrics are combined to form the criterion for sample query $Q(x)$ of unlabeled target data $\mathbb{T}_u=\{\boldsymbol{X}_u\}$. Top samples $\boldsymbol{X}_l$ with large query scores are selected for annotation $\boldsymbol{Y}_l$ by experts. These queried samples and their annotations are used to fine-tune the network via a Selective Semi-supervised Fine-tuning. After fine-tuned by labeled data $(\boldsymbol{X}_l,\boldsymbol{Y}_l)$, the network makes predictions to generate probability maps for unlabeled target data $\boldsymbol{X}_u$. Unlabeled data $\boldsymbol{X}_{t,u}^r$ are selected and their pseudo labels $\boldsymbol{Y}_{t,u}^r$ are generated. These data $(\boldsymbol{X}_{t,u}^r,\boldsymbol{Y}_{t,u}^r)$ are combined with labeled data $(\boldsymbol{X}_l,\boldsymbol{Y}_l)$ to fine-tune the network.
  • Figure 2: Qualitative comparison among results of the medical vision foundation models fine-tuned by (A) $5\%$ and (B) $25\%$ samples from the AMOS2022-CT domain queried by our methods and other SOTA methods. Red boxes mark the regions where our methods exhibit better segmentation results than SOTA methods.
  • Figure 3: Qualitative comparison among results of the medical vision foundation models fine-tuned by (A) $5\%$ and (B) $30\%$ samples from the AMOS2022-MRI domain queried by our methods and other SOTA methods. Red boxes mark the regions where our methods exhibit better segmentation results than SOTA methods.
  • Figure 4: Qualitative comparison among results of the medical vision foundation models fine-tuned by (A) $5\%$ samples from the FLARE2021 domain and (B) 3-shot from the Abdominal MRI domain queried by our methods and other SOTA methods. Red boxes mark the regions where our methods exhibit better segmentation results than SOTA methods.
  • Figure 5: Comparison of distributions of Dice scores from selected (s) unlabeled samples for the Selective Semi-supervised Fine-tuning and unselected (u) unlabeled samples when adapting Med-VFMs to the AMOS2022-CT domain for five rounds (r1, r2, r3, r4, and r5).