Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
Ziyang Chen, Yongsheng Pan, Yiwen Ye, Mengkang Lu, Yong Xia
TL;DR
This paper tackles distribution shift in 2D medical image segmentation under continual test-time adaptation by freezing the pre-trained model and learning a per-image visual prompt. The proposed Visual Prompt-based Test-Time Adaptation (VPTTA) designs a low-frequency prompt, initializes it via a memory bank, and trains it with a BN statistics alignment loss plus a warm-up mechanism to enable a single-iteration adaptation. Key contributions include the low-frequency prompt design, memory-based initialization, and the warm-up statistics strategy, all enabling robust adaptation without updating model parameters. Experiments on OD/OC and polyp segmentation across multiple centers demonstrate that VPTTA outperforms state-of-the-art CTTA methods and maintains robustness over long sequences of domain shifts, with code available for reproduction.
Abstract
Distribution shift widely exists in medical images acquired from different medical centres and poses a significant obstacle to deploying the pre-trained semantic segmentation model in real-world applications. Test-time adaptation has proven its effectiveness in tackling the cross-domain distribution shift during inference. However, most existing methods achieve adaptation by updating the pre-trained models, rendering them susceptible to error accumulation and catastrophic forgetting when encountering a series of distribution shifts (i.e., under the continual test-time adaptation setup). To overcome these challenges caused by updating the models, in this paper, we freeze the pre-trained model and propose the Visual Prompt-based Test-Time Adaptation (VPTTA) method to train a specific prompt for each test image to align the statistics in the batch normalization layers. Specifically, we present the low-frequency prompt, which is lightweight with only a few parameters and can be effectively trained in a single iteration. To enhance prompt initialization, we equip VPTTA with a memory bank to benefit the current prompt from previous ones. Additionally, we design a warm-up mechanism, which mixes source and target statistics to construct warm-up statistics, thereby facilitating the training process. Extensive experiments demonstrate the superiority of our VPTTA over other state-of-the-art methods on two medical image segmentation benchmark tasks. The code and weights of pre-trained source models are available at https://github.com/Chen-Ziyang/VPTTA.
