NC-TTT: A Noise Contrastive Approach for Test-Time Training

David Osowiechi; Gustavo A. Vargas Hakim; Mehrdad Noori; Milad Cheraghalikhani; Ali Bahri; Moslem Yazdanpanah; Ismail Ben Ayed; Christian Desrosiers

NC-TTT: A Noise Contrastive Approach for Test-Time Training

David Osowiechi, Gustavo A. Vargas Hakim, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers

TL;DR

NC-TTT tackles the problem of test-time robustness under domain shift by introducing a noise-contrastive auxiliary task that estimates a source-feature distribution and guides unsupervised adaptation at test time. The method blends a standard classification head with a density-estimation inspired auxiliary branch, using a linear projector and a discriminator to distinguish noisy in-distribution views from noisier out-of-distribution views of projected features. Empirical results on CIFAR-10-C, CIFAR-100-C, and VisDA-C show substantial improvements over state-of-the-art TTT/TTA methods, with average gains of 30.61% (CIFAR-10-C) and 26.31% (CIFAR-100-C), and a 16.19-point boost on VisDA-C, often with adaptation focused in the first encoder layers. The approach is lightweight and broadly applicable, offering a principled, unsupervised mechanism to align test-time representations with the source distribution, enhancing practical robustness to domain shifts.

Abstract

Despite their exceptional performance in vision tasks, deep learning models often struggle when faced with domain shifts during testing. Test-Time Training (TTT) methods have recently gained popularity by their ability to enhance the robustness of models through the addition of an auxiliary objective that is jointly optimized with the main task. Being strictly unsupervised, this auxiliary objective is used at test time to adapt the model without any access to labels. In this work, we propose Noise-Contrastive Test-Time Training (NC-TTT), a novel unsupervised TTT technique based on the discrimination of noisy feature maps. By learning to classify noisy views of projected feature maps, and then adapting the model accordingly on new domains, classification performance can be recovered by an important margin. Experiments on several popular test-time adaptation baselines demonstrate the advantages of our method compared to recent approaches for this task. The code can be found at:https://github.com/GustavoVargasHakim/NCTTT.git

NC-TTT: A Noise Contrastive Approach for Test-Time Training

TL;DR

Abstract

Paper Structure (14 sections, 16 equations, 8 figures, 7 tables)

This paper contains 14 sections, 16 equations, 8 figures, 7 tables.

Introduction
Related work
Methodology
The proposed method
Noise-contrastive Test-time Training
Selecting the distribution variances
Experimental Settings
Results
Image classification on common corruptions
Image classification on sim-to-real domain shift
Conclusions
Deriving the posterior of Equation (6)
Results on different levels of CIFAR-10-C corruptions
Hyperparameter search on VisDA-C

Figures (8)

Figure 1: Overview of our Noise-Contrastive Test-Time-Training (NC-TTT) method. The auxiliary module comprises a linear projector $p_{\varphi}$ that reduces the scale of features, and a classifier $q_{\varphi}$ to discriminate between two different noisy views of the reduced features.
Figure 2: Posterior probability $p(y_s = 1|\mathbf{z})$ of 2D points with different pairs $(\sigma_s, \sigma_o)$. The in-domain influence expands by increasing $\sigma_o$ for a fixed $\sigma_s$ (see difference row-wise). Furthermore, this region is more regular when $\sigma_s$ increases when $\sigma_o$ is fixed (see difference column-wise).
Figure 3: Noise 2D vectors sampled with $\sigma_{s} = 0.05$ and $\sigma_{o} = 1$ (left). The overlapping of both distributions can be overcome by assigning a probability to each point based on our threshold method.
Figure 4: Heatmap of in-distribution probabilities, i.e., $p(y_s\!=\!1 \, | \, \mathbf{z})$ approximated by $q_{\varphi}(\mathbf{z})$ in our model, and spatial gradient of log-likelihood function, i.e. $\nabla \log q_{\varphi}(\mathbf{z})$, which is used as test-time adaptation objective. The arrow shows how an OOD test sample (white point) is adapted toward the source distribution.
Figure 5: Expected in-distribution label as a function of noise ratio $\beta = \sigma_{o}/\sigma_{s}$.
...and 3 more figures

NC-TTT: A Noise Contrastive Approach for Test-Time Training

TL;DR

Abstract

NC-TTT: A Noise Contrastive Approach for Test-Time Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)