Table of Contents
Fetching ...

Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation

Ziyang Chen, Yongsheng Pan, Yiwen Ye, Mengkang Lu, Yong Xia

TL;DR

This paper tackles distribution shift in 2D medical image segmentation under continual test-time adaptation by freezing the pre-trained model and learning a per-image visual prompt. The proposed Visual Prompt-based Test-Time Adaptation (VPTTA) designs a low-frequency prompt, initializes it via a memory bank, and trains it with a BN statistics alignment loss plus a warm-up mechanism to enable a single-iteration adaptation. Key contributions include the low-frequency prompt design, memory-based initialization, and the warm-up statistics strategy, all enabling robust adaptation without updating model parameters. Experiments on OD/OC and polyp segmentation across multiple centers demonstrate that VPTTA outperforms state-of-the-art CTTA methods and maintains robustness over long sequences of domain shifts, with code available for reproduction.

Abstract

Distribution shift widely exists in medical images acquired from different medical centres and poses a significant obstacle to deploying the pre-trained semantic segmentation model in real-world applications. Test-time adaptation has proven its effectiveness in tackling the cross-domain distribution shift during inference. However, most existing methods achieve adaptation by updating the pre-trained models, rendering them susceptible to error accumulation and catastrophic forgetting when encountering a series of distribution shifts (i.e., under the continual test-time adaptation setup). To overcome these challenges caused by updating the models, in this paper, we freeze the pre-trained model and propose the Visual Prompt-based Test-Time Adaptation (VPTTA) method to train a specific prompt for each test image to align the statistics in the batch normalization layers. Specifically, we present the low-frequency prompt, which is lightweight with only a few parameters and can be effectively trained in a single iteration. To enhance prompt initialization, we equip VPTTA with a memory bank to benefit the current prompt from previous ones. Additionally, we design a warm-up mechanism, which mixes source and target statistics to construct warm-up statistics, thereby facilitating the training process. Extensive experiments demonstrate the superiority of our VPTTA over other state-of-the-art methods on two medical image segmentation benchmark tasks. The code and weights of pre-trained source models are available at https://github.com/Chen-Ziyang/VPTTA.

Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation

TL;DR

This paper tackles distribution shift in 2D medical image segmentation under continual test-time adaptation by freezing the pre-trained model and learning a per-image visual prompt. The proposed Visual Prompt-based Test-Time Adaptation (VPTTA) designs a low-frequency prompt, initializes it via a memory bank, and trains it with a BN statistics alignment loss plus a warm-up mechanism to enable a single-iteration adaptation. Key contributions include the low-frequency prompt design, memory-based initialization, and the warm-up statistics strategy, all enabling robust adaptation without updating model parameters. Experiments on OD/OC and polyp segmentation across multiple centers demonstrate that VPTTA outperforms state-of-the-art CTTA methods and maintains robustness over long sequences of domain shifts, with code available for reproduction.

Abstract

Distribution shift widely exists in medical images acquired from different medical centres and poses a significant obstacle to deploying the pre-trained semantic segmentation model in real-world applications. Test-time adaptation has proven its effectiveness in tackling the cross-domain distribution shift during inference. However, most existing methods achieve adaptation by updating the pre-trained models, rendering them susceptible to error accumulation and catastrophic forgetting when encountering a series of distribution shifts (i.e., under the continual test-time adaptation setup). To overcome these challenges caused by updating the models, in this paper, we freeze the pre-trained model and propose the Visual Prompt-based Test-Time Adaptation (VPTTA) method to train a specific prompt for each test image to align the statistics in the batch normalization layers. Specifically, we present the low-frequency prompt, which is lightweight with only a few parameters and can be effectively trained in a single iteration. To enhance prompt initialization, we equip VPTTA with a memory bank to benefit the current prompt from previous ones. Additionally, we design a warm-up mechanism, which mixes source and target statistics to construct warm-up statistics, thereby facilitating the training process. Extensive experiments demonstrate the superiority of our VPTTA over other state-of-the-art methods on two medical image segmentation benchmark tasks. The code and weights of pre-trained source models are available at https://github.com/Chen-Ziyang/VPTTA.
Paper Structure (20 sections, 5 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 20 sections, 5 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison between our VPTTA and existing solutions under the CTTA setup. Our VPTTA avoids both error accumulation (EA) and catastrophic forgetting (CF) by freezing model parameters and achieves adaptation by training a prompt for each test image. The commonly used self-supervised loss and the optimized one are denoted by $\mathcal{L}_{self}$ and $\mathcal{L}_{self}^{'}$, respectively.
  • Figure 2: Overview of our VPTTA. For each test image, (1) the Fast Fourier Transform (FFT) is first applied to transform it into the frequency domain, where the low-frequency component of amplitude is used to query in a memory bank to initialize the current prompt, and then the amplitude is multiplied with the prompt at the low-frequency component and transformed back to the spatial domain using the Inverse Fast Fourier Transform (IFFT). (2) The memory bank is built on the previous low-frequency components and their corresponding prompts and is updated using the First In First Out (FIFO) strategy. (3) We convert the source statistics stored in BN layers into the warm-up statistics and calculate the absolute distance between warm-up and target statistics as the loss to fine-tune the prompt. (4) Finally, we feed the image corrected by its fine-tuned prompt to the pre-trained model to produce the output. 'Stat': Abbreviation of 'Statistics'.
  • Figure 3: Visualization of the original images, estimated prompts, and adapted images on the OD/OC segmentation task. We normalize the prompts to [0, 1] for better visualization. The DSC of applying the frozen source model on the original and adapted images is displayed below each image. We also show an example of each source domain on the left side of this diagram. 'Ori': Abbreviation of 'Original'.
  • Figure 4: Performance of our VPTTA with various $\alpha$ on the OD/OC segmentation task.
  • Figure 5: Performance of our VPTTA with various $S$, $K$, and $\tau$ on the OD/OC segmentation task.
  • ...and 1 more figures