Table of Contents
Fetching ...

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang

TL;DR

This work addresses continual test-time adaptation (CTTA) under non-stationary target domains by introducing Visual Domain Adapters (ViDA) with high-rank and low-rank embedding spaces. A Homeostatic Knowledge Allotment (HKA) strategy dynamically fuses domain-specific and domain-shared knowledge guided by sample uncertainty, enabling robust adaptation without increasing model parameters. Using a teacher–student framework with pseudo-label consistency, ViDA achieves state-of-the-art results on classification and segmentation CTTA benchmarks and demonstrates effective adaptation for foundation models, as well as improved generalization to unseen domains. Overall, ViDA offers a scalable, transfer-friendly paradigm for continual distribution shifting in large-scale vision models.

Abstract

Since real-world machine systems are running in non-stationary environments, Continual Test-Time Adaptation (CTTA) task is proposed to adapt the pre-trained model to continually changing target domains. Recently, existing methods mainly focus on model-based adaptation, which aims to leverage a self-training manner to extract the target domain knowledge. However, pseudo labels can be noisy and the updated model parameters are unreliable under dynamic data distributions, leading to error accumulation and catastrophic forgetting in the continual adaptation process. To tackle these challenges and maintain the model plasticity, we design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-shared knowledge. Specifically, we first comprehensively explore the different domain representations of the adapters with trainable high-rank or low-rank embedding spaces. Then we inject ViDAs into the pre-trained model, which leverages high-rank and low-rank features to adapt the current domain distribution and maintain the continual domain-shared knowledge, respectively. To exploit the low-rank and high-rank ViDAs more effectively, we further propose a Homeostatic Knowledge Allotment (HKA) strategy, which adaptively combines different knowledge from each ViDA. Extensive experiments conducted on four widely used benchmarks demonstrate that our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks. Note that, our method can be regarded as a novel transfer paradigm for large-scale models, delivering promising results in adaptation to continually changing distributions. Project page: https://sites.google.com/view/iclr2024-vida/home.

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

TL;DR

This work addresses continual test-time adaptation (CTTA) under non-stationary target domains by introducing Visual Domain Adapters (ViDA) with high-rank and low-rank embedding spaces. A Homeostatic Knowledge Allotment (HKA) strategy dynamically fuses domain-specific and domain-shared knowledge guided by sample uncertainty, enabling robust adaptation without increasing model parameters. Using a teacher–student framework with pseudo-label consistency, ViDA achieves state-of-the-art results on classification and segmentation CTTA benchmarks and demonstrates effective adaptation for foundation models, as well as improved generalization to unseen domains. Overall, ViDA offers a scalable, transfer-friendly paradigm for continual distribution shifting in large-scale vision models.

Abstract

Since real-world machine systems are running in non-stationary environments, Continual Test-Time Adaptation (CTTA) task is proposed to adapt the pre-trained model to continually changing target domains. Recently, existing methods mainly focus on model-based adaptation, which aims to leverage a self-training manner to extract the target domain knowledge. However, pseudo labels can be noisy and the updated model parameters are unreliable under dynamic data distributions, leading to error accumulation and catastrophic forgetting in the continual adaptation process. To tackle these challenges and maintain the model plasticity, we design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-shared knowledge. Specifically, we first comprehensively explore the different domain representations of the adapters with trainable high-rank or low-rank embedding spaces. Then we inject ViDAs into the pre-trained model, which leverages high-rank and low-rank features to adapt the current domain distribution and maintain the continual domain-shared knowledge, respectively. To exploit the low-rank and high-rank ViDAs more effectively, we further propose a Homeostatic Knowledge Allotment (HKA) strategy, which adaptively combines different knowledge from each ViDA. Extensive experiments conducted on four widely used benchmarks demonstrate that our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks. Note that, our method can be regarded as a novel transfer paradigm for large-scale models, delivering promising results in adaptation to continually changing distributions. Project page: https://sites.google.com/view/iclr2024-vida/home.
Paper Structure (29 sections, 12 equations, 7 figures, 16 tables)

This paper contains 29 sections, 12 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: The problem and motivation.(a) Our goal is to effectively adapt the source pre-trained model to continually changing target domains. We propose Visual Domain Adapters with high-rank and low-rank embedding spaces to tackle the error accumulation and catastrophic forgetting challenges during the continual adaptation process. (b) we conduct a t-SNE van2008visualizing distribution analysis for the different adapter representations across four target domains (ACDC). The low-rank branch exhibits a consistent distribution across the target domains, suggesting that it can effectively disregard the impact of dynamic distribution shifts. The high-rank branch demonstrates noticeable distribution discrepancies between the various target domains, suggesting that it primarily focuses on extracting domain-specific knowledge.
  • Figure 2: The framework of Visual Domain Adapter (ViDA).(a) We inject low-rank and high-rank ViDAs into either linear or Conv layers of the pre-trained source model. The student model processes the original image, while the teacher model processes an augmented version of the same image. To update the ViDAs, we construct a teacher-student framework and use a consistency loss (Eq. \ref{['eq:loss']}) as the optimization objective. In addition, the teacher model calculates an uncertainty value (Eq. \ref{['eq:mc']}), reflecting the distribution shift of each sample in target domains. (b) Based on the degree of distribution shift, we introduce the Homeostatic Knowledge Allotment (HKA) strategy, which aims to dynamically fuse the knowledge from each ViDA with different domain representation.
  • Figure 3: c1 to c15 represent the 15 corruption domains in CIFAR10C listed in sequential order. (a) Low-rank adapter based model effectively mitigates inter-domain divergence than the source model across all 14 domain shifts. (b) High-rank adapter based model significantly enhances the intra-class feature aggregation, yielding results that closely approximate those achieved by our ViDA method.
  • Figure 4: The qualitative analysis of the CAM. We adopt CAM to compare the attention of the low-rank branch, high-rank branch, and the original model during the continual adaptation process.
  • Figure 5: (a) We conduct more t-SNE results for the low-rank adapter and high-rank adapter on the ACDC dataset. The first to third columns illustrate the feature distributions of transformer blocks 1, 2, and 4, respectively. (b) The 10 rounds CTTA experiment on ImageNet-to-ImageNet-C, repeating 10 rounds of 15 corruption sequences.
  • ...and 2 more figures