Table of Contents
Fetching ...

Multi-Scale Global-Instance Prompt Tuning for Continual Test-time Adaptation in Medical Image Segmentation

Lingrui Li, Yanfeng Zhou, Nan Pu, Xin Chen, Zhun Zhong

TL;DR

This work tackles distribution shifts in medical image segmentation under Continual Test-Time Adaptation (CTTA) by introducing MGIPT, a dual-prompt framework that combines Adaptive-scale Instance Prompts (AIP) and Multi-scale Global Prompts (MGP). AIP progressively adapts instance-specific features through scale-aware prompts with an early stopping criterion, while MGP captures domain-level knowledge across multiple frequencies using a memory-free teacher-student scheme and EMA updates. The approach is evaluated on optic disc/cup and polyp segmentation across multiple target domains and long-term CTTA rounds, consistently outperforming state-of-the-art methods and demonstrating improved robustness and privacy by avoiding memory banks. A limitation is the computational overhead due to per-sample prompt scaling, suggesting avenues for faster scale selection and efficiency improvements in future work.

Abstract

Distribution shift is a common challenge in medical images obtained from different clinical centers, significantly hindering the deployment of pre-trained semantic segmentation models in real-world applications across multiple domains. Continual Test-Time Adaptation(CTTA) has emerged as a promising approach to address cross-domain shifts during continually evolving target domains. Most existing CTTA methods rely on incrementally updating model parameters, which inevitably suffer from error accumulation and catastrophic forgetting, especially in long-term adaptation. Recent prompt-tuning-based works have shown potential to mitigate the two issues above by updating only visual prompts. While these approaches have demonstrated promising performance, several limitations remain:1)lacking multi-scale prompt diversity, 2)inadequate incorporation of instance-specific knowledge, and 3)risk of privacy leakage. To overcome these limitations, we propose Multi-scale Global-Instance Prompt Tuning(MGIPT), to enhance scale diversity of prompts and capture both global- and instance-level knowledge for robust CTTA. Specifically, MGIPT consists of an Adaptive-scale Instance Prompt(AIP) and a Multi-scale Global-level Prompt(MGP). AIP dynamically learns lightweight and instance-specific prompts to mitigate error accumulation with adaptive optimal-scale selection mechanism. MGP captures domain-level knowledge across different scales to ensure robust adaptation with anti-forgetting capabilities. These complementary components are combined through a weighted ensemble approach, enabling effective dual-level adaptation that integrates both global and local information. Extensive experiments on medical image segmentation benchmarks demonstrate that our MGIPT outperforms state-of-the-art methods, achieving robust adaptation across continually changing target domains.

Multi-Scale Global-Instance Prompt Tuning for Continual Test-time Adaptation in Medical Image Segmentation

TL;DR

This work tackles distribution shifts in medical image segmentation under Continual Test-Time Adaptation (CTTA) by introducing MGIPT, a dual-prompt framework that combines Adaptive-scale Instance Prompts (AIP) and Multi-scale Global Prompts (MGP). AIP progressively adapts instance-specific features through scale-aware prompts with an early stopping criterion, while MGP captures domain-level knowledge across multiple frequencies using a memory-free teacher-student scheme and EMA updates. The approach is evaluated on optic disc/cup and polyp segmentation across multiple target domains and long-term CTTA rounds, consistently outperforming state-of-the-art methods and demonstrating improved robustness and privacy by avoiding memory banks. A limitation is the computational overhead due to per-sample prompt scaling, suggesting avenues for faster scale selection and efficiency improvements in future work.

Abstract

Distribution shift is a common challenge in medical images obtained from different clinical centers, significantly hindering the deployment of pre-trained semantic segmentation models in real-world applications across multiple domains. Continual Test-Time Adaptation(CTTA) has emerged as a promising approach to address cross-domain shifts during continually evolving target domains. Most existing CTTA methods rely on incrementally updating model parameters, which inevitably suffer from error accumulation and catastrophic forgetting, especially in long-term adaptation. Recent prompt-tuning-based works have shown potential to mitigate the two issues above by updating only visual prompts. While these approaches have demonstrated promising performance, several limitations remain:1)lacking multi-scale prompt diversity, 2)inadequate incorporation of instance-specific knowledge, and 3)risk of privacy leakage. To overcome these limitations, we propose Multi-scale Global-Instance Prompt Tuning(MGIPT), to enhance scale diversity of prompts and capture both global- and instance-level knowledge for robust CTTA. Specifically, MGIPT consists of an Adaptive-scale Instance Prompt(AIP) and a Multi-scale Global-level Prompt(MGP). AIP dynamically learns lightweight and instance-specific prompts to mitigate error accumulation with adaptive optimal-scale selection mechanism. MGP captures domain-level knowledge across different scales to ensure robust adaptation with anti-forgetting capabilities. These complementary components are combined through a weighted ensemble approach, enabling effective dual-level adaptation that integrates both global and local information. Extensive experiments on medical image segmentation benchmarks demonstrate that our MGIPT outperforms state-of-the-art methods, achieving robust adaptation across continually changing target domains.
Paper Structure (17 sections, 7 equations, 5 figures, 4 tables)

This paper contains 17 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: (a) The goal of continual test-time adaptation (CTTA) is to adapt a pre-trained source model to handle data in dynamically changing target domains. (b) Overview of tuning strategies in CTTA. Fully-Tuning often causes catastrophic forgetting due to frequent parameter updates and error accumulation due to noisy pseudo labels; Efficient-Tuning (ET) on Batch-Normalization (BN) layers and ET with adapters struggle with balancing trade-offs of inadequate adaptation or error accumulation; ET with single-scale global prompts leads to suboptimal individual instance adaptation and privacy leakage. Our method addresses error accumulation and forgetting by leveraging dual-level prompts with multiple scales and robust parameter updating strategies while reducing the risk of privacy leakage. (c) Performance comparison on OD/OC datasets over 3 continual rounds. Our approach demonstrates superior performance over previous methods.
  • Figure 2: The framework of MGIPT. This framework leverages instance-specific and global knowledge to tackle the error accumulation and forgetting challenges during CTTA. MGIPT comprises: (1) Low-Frequency Prompts: trainable pixels initialized as ones, optimized using Batch Normalization (BN) alignment loss. (2) Adaptive-scale Instance Prompt: input images are transformed to the frequency domain via Fast Fourier Transform (FFT), combined with instance-level prompts (followed by inverse FFT), and optimized using progressive prompt tuning (PPT) with an early stopping (ES) strategy for optimal scale selection. (3) Multi-scale Global Prompt: captures domain-level knowledge across varying receptive fields using multi-scale prompts, updated with a teacher-student mechanism and Exponential Moving Average (EMA). (4) Inference: fine-tuned instance and global prompts are combined to adapt the input image, producing the final output.
  • Figure 3: Qualitative comparison of segmentation results on target domains for OD/OC benchmark. ‘Ori’: ‘Original’.
  • Figure 4: Visualization of the original images, finetuned prompts, and adapted images on the OD/OC segmentation task. We normalize the prompts to [0, 1] for better visualization. We also show some examples of source domain on the upper side of this diagram. "Ori": Abbreviation of "Original".
  • Figure 5: Performance with various $e$ and $\lambda$ on OD/OC task.