Persistent Test-time Adaptation in Recurring Testing Scenarios

Trung-Hieu Hoang; Duc Minh Vo; Minh N. Do

Persistent Test-time Adaptation in Recurring Testing Scenarios

Trung-Hieu Hoang, Duc Minh Vo, Minh N. Do

Abstract

Current test-time adaptation (TTA) approaches aim to adapt a machine learning model to environments that change continuously. Yet, it is unclear whether TTA methods can maintain their adaptability over prolonged periods. To answer this question, we introduce a diagnostic setting - recurring TTA where environments not only change but also recur over time, creating an extensive data stream. This setting allows us to examine the error accumulation of TTA models, in the most basic scenario, when they are regularly exposed to previous testing environments. Furthermore, we simulate a TTA process on a simple yet representative $ε$-perturbed Gaussian Mixture Model Classifier, deriving theoretical insights into the dataset- and algorithm-dependent factors contributing to gradual performance degradation. Our investigation leads us to propose persistent TTA (PeTTA), which senses when the model is diverging towards collapse and adjusts the adaptation strategy, striking a balance between the dual objectives of adaptation and model collapse prevention. The supreme stability of PeTTA over existing approaches, in the face of lifelong TTA scenarios, has been demonstrated over comprehensive experiments on various benchmarks. Our project page is available at https://hthieu166.github.io/petta.

Persistent Test-time Adaptation in Recurring Testing Scenarios

Abstract

-perturbed Gaussian Mixture Model Classifier, deriving theoretical insights into the dataset- and algorithm-dependent factors contributing to gradual performance degradation. Our investigation leads us to propose persistent TTA (PeTTA), which senses when the model is diverging towards collapse and adjusts the adaptation strategy, striking a balance between the dual objectives of adaptation and model collapse prevention. The supreme stability of PeTTA over existing approaches, in the face of lifelong TTA scenarios, has been demonstrated over comprehensive experiments on various benchmarks. Our project page is available at https://hthieu166.github.io/petta.

Paper Structure (42 sections, 8 theorems, 36 equations, 11 figures, 22 tables, 1 algorithm)

This paper contains 42 sections, 8 theorems, 36 equations, 11 figures, 22 tables, 1 algorithm.

Introduction
Background
Recurring TTA and Theoretical Analysis
Recurring TTA and Model Collapse
Simulation of Failure and Theoretical Analysis
Connection to Existing Solutions
Persistent Test-time Adaptation (PeTTA)
Experimental Results
GMMC Simulation Result
Setup - Benchmark Datasets
Result - Benchmark Datasets
Ablation Study
Discussions and Conclusion
Related Work
Proof of Lemmas and Theorems
...and 27 more sections

Key Result

Lemma 1

Under Assumption as:static_stream, a binary $\epsilon$-GMMC would collapsed (Def. def:model_collapse) with $\underset{t \to \tau}{\lim} \hat{p}_{1,t} = 0$ (or $\underset{t \to \tau}{\lim} \hat{p}_{0,t} = 1$, equivalently) if and only if $\underset{t \to \tau}{\lim} \epsilon_{t} = p_1$.

Figures (11)

Figure 1: Recurring Test-time Adaption (TTA). (left) Testing environments may change recurringly and preserving adaptability when visiting the same testing condition is not guaranteed. (right) The testing error of RoTTA yuan2023robust progressively raises (performance degradation) and exceeds the error of the source model (no TTA) while our PeTTA demonstrates its stability when adapting to the test set of CIFAR-10-C hendrycks2019robustness 20 times. The bold lines denote the running mean and the shaded lines in the background represent the testing error on each domain (excluding the source model, for clarity).
Figure 2: $\epsilon$-perturbed binary Gaussian Mixture Model Classifier, imitating a continual TTA algorithm for theoretical analysis. Two main components include a pseudo-label predictor (Eq. \ref{['eq:pseudo_label']}), and a mean teacher update (Eqs. \ref{['eq:general_opti_step']}, \ref{['eq:general_teacher_update']}). The predictor is perturbed for retaining a false negative rate of $\epsilon_t$ to simulate an undesirable TTA testing stream.
Figure 3: Simulation result on $\epsilon$-perturbed Gaussian Mixture Model Classifier ($\epsilon$-GMMC) and GMMC (perturbed-free). (a) Histogram of model predictions through time. A similar prediction frequency pattern is observed on CIFAR-10-C (Fig. \ref{['fig:cifar10-c-result']}a-left). (b) The probability density function of the two clusters after convergence versus the true data distribution. The initial two clusters of $\epsilon$-GMMC collapsed into a single cluster with parameters stated in Lemma \ref{['lmm:collapsed']}. In the perturbed-free, GMMC converges to the true data distribution. (c) Distance toward $\mu_1$ ($\left|\mathbb{E}_{P_t}\left[\hat{\mu}_{0,t}\right] - \mu_1 \right|$) and false-negative rate ($\epsilon_t$) in simulation coincides with the result in Thm. \ref{['thm:cvg']} (with $\epsilon_t$ following Corollary \ref{['corollary:condition']}).
Figure 4: Classification error of TRIBE Su_Xu_Jia_2024 and PeTTA (ours) of the task CIFAR-10$\rightarrow$CIFAR10-C task in recurring TTA with 40 visits.
Figure 5: Recurring TTA (20 visits) on CIFAR-10$\rightarrow$CIFAR10-C task. (a) Histogram of model predictions (10 labels are color-coded). PeTTA achieves a persisting performance while RoTTA yuan2023robust degrades. (b) Confusion matrix at the last visit, RoTTA classifies all samples into a few categories (e.g., 0: airplane, 4: deer). (c) Force-directed graphs showing (left) the most prone to misclassification pairs (arrows indicating the portion and pointing from the true to the misclassified category); (right) similar categories tend to be easily collapsed. Edges denote the average cosine similarity of feature vectors (source model), only the highest similar pairs are shown. Best viewed in color.
...and 6 more figures

Theorems & Definitions (16)

Definition 1: Model Collapse
Lemma 1: Increasing FNR
Lemma 2: $\epsilon$-GMMC After Collapsing
Theorem 1: Convergence of $\epsilon-$GMMC
Corollary 1: A Condition for $\epsilon-$GMMC Collapse
Definition 1: Model Collapse
Lemma 1: Increasing FNR
proof
Lemma 2: $\epsilon$-GMMC After Collapsing
proof
...and 6 more

Persistent Test-time Adaptation in Recurring Testing Scenarios

Abstract

Persistent Test-time Adaptation in Recurring Testing Scenarios

Authors

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (16)