Two Simple Principles for Diffusion-Based Test-Time Adaptation

Kaiyu Song; Hanjiang Lai; Yan Pan; Kun Yue; Jian Yin

Two Simple Principles for Diffusion-Based Test-Time Adaptation

Kaiyu Song, Hanjiang Lai, Yan Pan, Kun Yue, Jian Yin

TL;DR

This paper proposes a semantic keeper, the method to preserve feature similarity, where the semantic keeper could filter the corruption introduced from the test domain, thus better preserving the semantics, and introduces the gradient-based view to unify the direction generated from two principles.

Abstract

Recently, diffusion-based test-time adaptations (TTA) have shown great advances, which leverage a diffusion model to map the images in the unknown test domain to the training domain. The unseen and diverse test domains make diffusion-based TTA an ill-posed problem. In this paper, we unravel two simple principles of the design tricks for diffusion-based methods. Intuitively, \textit{Principle 1} says semantic similarity preserving. We should preserve the semantic similarity between the original and generated test images. \textit{Principle 2} suggests minimal modifications. This principle enables the diffusion to map the test images to the training domain with minimal modifications of the test images. In particular, following the two principles, we propose our simple yet effective principle-guided diffusion-based test-time adaptation method (PDDA). Concretely, following Principle 1, we propose a semantic keeper, the method to preserve feature similarity, where the semantic keeper could filter the corruption introduced from the test domain, thus better preserving the semantics. Following Principle 2, we propose a modification keeper, where we introduce a regularization constraint into the generative process to minimize modifications to the test image. Meanwhile, there is a hidden conflict between the two principles. We further introduce the gradient-based view to unify the direction generated from two principles. Extensive experiments on CIFAR-10C, CIFAR-100C, ImageNet-W, and ImageNet-C with WideResNet-28-10, ResNet-50, Swin-T, and ConvNext-T demonstrate that PDDA significantly performs better than the complex state-of-the-art baselines. Specifically, PDDA achieves 2.4\% average accuracy improvements in ImageNet-C without any training process.

Two Simple Principles for Diffusion-Based Test-Time Adaptation

TL;DR

Abstract

Paper Structure (10 sections, 14 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 10 sections, 14 equations, 6 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Preliminary
Method
Experiment
Experiment Settings
Experimental Results
Ablation Study
Visualization
Conclusion

Figures (6)

Figure 1: An illustration of the different information among features of different layers in UNet. We visualize the feature extracted from all layers by using features of a single layer to calculate $f^1$. It can be noticed that the details of the generated images will have little differences to prove that there is different information. This is obvious in the noise test domain since deep layers will contain the extra noise with more details.
Figure 2: An overview of the proposed PDDA, where denoise is to estimate the $\hat{x}_{0|t}$. We implement the conditional term in the interval $t \in [s,0)$, and the interval $t\in [T,s)$ is the reverse process without the conditional term. $x_{t}\rightarrow x_{t-1}$ represents the one step of the reverse process. We first achieve the process that maps the image from the test domain to the training domain (a) to generate the $x_{0}$ and use both $x_{0}$ and $x^{test}$ to finish the classification (b).
Figure 3: An illustration of the ill-posed problem made in different domains. We use different cycles with different colors to represent different domains. The red cycle represents the training domain. The diffusion-based methods need to narrow the distance between the $x^{test}$ and $x^{src}$, which needs to let random samples from unknown different domains close to the training domain.
Figure 4: An illustration of the conflict in two guidance. w/ means using the gradient projection, and w/o means not using the gradient projection. We visualize the gradient magnitude similarity in Eq. \ref{['eq:conflict']} based on samples from different test domains. It can be noticed that gradient magnitude similarity tends to zero without the gradient projection shown in (a). Then, our method can significantly increase the similarity.
Figure 5: Visualization of ablation study for sampling strategy against different corruption, where PDDA (w/) means we use the sampling strategy and PDDA (w/o) means we do not use the proposed sampling strategy, and ground truth is the image in the training domain, i.e., without corruption.
...and 1 more figures

Two Simple Principles for Diffusion-Based Test-Time Adaptation

TL;DR

Abstract

Two Simple Principles for Diffusion-Based Test-Time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)