Variational Continual Test-Time Adaptation

Fan Lyu; Kaile Du; Yuyang Li; Hanyu Zhao; Fuyuan Hu; Zhang Zhang; Guangcan Liu; Liang Wang

Variational Continual Test-Time Adaptation

Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Fuyuan Hu, Zhang Zhang, Guangcan Liu, Liang Wang

TL;DR

<3-5 sentence high-level summary> VCoTTA tackles continual test-time adaptation by embedding uncertainty into a pretrained model via variational warm-up to form a Bayesian neural network, and by using a mean-teacher framework during testing to supervise online adaptation. The method introduces an adaptive prior mixture, combining the source prior and a teacher prior, with the ELBO expressed as a cross-entropy between student and teacher plus the KL divergence to the mixed prior. This uncertainty-aware approach mitigates error accumulation under persistent domain shifts and yields improved calibration and robustness across CTTA benchmarks. The work demonstrates that dynamically weighting priors based on uncertainty can outperform existing CTTA strategies, especially in long-horizon, unlabeled settings.

Abstract

Continual Test-Time Adaptation (CTTA) task investigates effective domain adaptation under the scenario of continuous domain shifts during testing time. Due to the utilization of solely unlabeled samples, there exists significant uncertainty in model updates, leading CTTA to encounter severe error accumulation issues. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA. At the source stage, we transform a pretrained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy, injecting uncertainties into the model. During the testing time, we employ a mean-teacher update strategy using variational inference for the student model and exponential moving average for the teacher model. Our novel approach updates the student model by combining priors from both the source and teacher models. The evidence lower bound is formulated as the cross-entropy between the student and teacher models, along with the Kullback-Leibler (KL) divergence of the prior mixture. Experimental results on three datasets demonstrate the method's effectiveness in mitigating error accumulation within the CTTA framework.

Variational Continual Test-Time Adaptation

TL;DR

Abstract

Paper Structure (44 sections, 1 theorem, 29 equations, 10 figures, 10 tables, 2 algorithms)

This paper contains 44 sections, 1 theorem, 29 equations, 10 figures, 10 tables, 2 algorithms.

Introduction
Related Work
Continual Test-Time Adaptation
Bayesian Neural Network
Variational Inference in CTTA
BI in traditional CL and in CTTA
VI in CTTA
Adaptation and Inference in VCoTTA
Entropy term: Integrating Mean Teacher into VI
KL term: Mixture-of-Gaussian Prior
Adaptation and Inference
Variational Warm-up
Student update via VI
Teacher update via EMA
Model inference
...and 29 more sections

Key Result

Lemma 1

The KL divergence between mixture distributions $p = \sum_{i=1}^{k}\alpha_i p_i$ and $p' = \sum_{i=1}^{k}\alpha_i p'_i$ is upper-bounded by where $\boldsymbol{\alpha} = (\alpha_1, \alpha_2, \cdots, \alpha_k)$ and $\boldsymbol{\alpha}' = (\alpha'_1, \alpha'_2, \cdots, \alpha'_k)$ are the weights of the mixture components. The equality holds if and only if ${\alpha_i p_i}/{\sum_{j=1}^{k}\alpha_j p_

Figures (10)

Figure 1: In CTTA task, a BNN model is first trained on a source dataset, and then is used to adapt to updated with unreliable priors, which may suffer from noisy update. In this paper, we use the Bayesian approach to measure the uncertainty and try to reduce the effect of unreliable priors, achieving better adaptation.
Figure 2: BI in continual learning versus CTTA. We find the traditional prior transmission is infeasible in CTTA because of the unreliable prior from unlabeled data. In our method, we place CTTA in a mean-teacher structure, and design BI in CTTA using a mixture of teacher prior and source prior. The next teacher prior is updated by the exponential moving average.
Figure 3: Sum of error rate (%) on ImageNet-to-ImageNetC.
Figure 4: Comparison of variational warm-up and directly BNN pretraining.
Figure 5: Comparisons on different warm-up settings of CIFAR10C.
...and 5 more figures

Theorems & Definitions (2)

Lemma 1
Proof 1

Variational Continual Test-Time Adaptation

TL;DR

Abstract

Variational Continual Test-Time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (2)