Table of Contents
Fetching ...

ReC-TTT: Contrastive Feature Reconstruction for Test-Time Training

Marco Colussi, Sergio Mascetti, Jose Dolz, Christian Desrosiers

TL;DR

ReC-TTT is proposed, a test-time training technique that can adapt a DL model to new un-seen domains by generating discriminative views of the input data by using cross-reconstruction between a frozen encoder and two trainable en-coders taking advantage of a single shared decoder.

Abstract

The remarkable progress in deep learning (DL) showcases outstanding results in various computer vision tasks. However, adaptation to real-time variations in data distributions remains an important challenge. Test-Time Training (TTT) was proposed as an effective solution to this issue, which increases the generalization ability of trained models by adding an auxiliary task at train time and then using its loss at test time to adapt the model. Inspired by the recent achievements of contrastive representation learning in unsupervised tasks, we propose ReC-TTT, a test-time training technique that can adapt a DL model to new unseen domains by generating discriminative views of the input data. ReC-TTT uses cross-reconstruction as an auxiliary task between a frozen encoder and two trainable encoders, taking advantage of a single shared decoder. This enables, at test time, to adapt the encoders to extract features that will be correctly reconstructed by the decoder that, in this phase, is frozen on the source domain. Experimental results show that ReC-TTT achieves better results than other state-of-the-art techniques in most domain shift classification challenges.

ReC-TTT: Contrastive Feature Reconstruction for Test-Time Training

TL;DR

ReC-TTT is proposed, a test-time training technique that can adapt a DL model to new un-seen domains by generating discriminative views of the input data by using cross-reconstruction between a frozen encoder and two trainable en-coders taking advantage of a single shared decoder.

Abstract

The remarkable progress in deep learning (DL) showcases outstanding results in various computer vision tasks. However, adaptation to real-time variations in data distributions remains an important challenge. Test-Time Training (TTT) was proposed as an effective solution to this issue, which increases the generalization ability of trained models by adding an auxiliary task at train time and then using its loss at test time to adapt the model. Inspired by the recent achievements of contrastive representation learning in unsupervised tasks, we propose ReC-TTT, a test-time training technique that can adapt a DL model to new unseen domains by generating discriminative views of the input data. ReC-TTT uses cross-reconstruction as an auxiliary task between a frozen encoder and two trainable encoders, taking advantage of a single shared decoder. This enables, at test time, to adapt the encoders to extract features that will be correctly reconstructed by the decoder that, in this phase, is frozen on the source domain. Experimental results show that ReC-TTT achieves better results than other state-of-the-art techniques in most domain shift classification challenges.

Paper Structure

This paper contains 22 sections, 4 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of our ReC-TTT framework. The directional flow of gradients is denoted by the symbol $\to$. $\mathcal{L}_{\mathit{aux}}$ is our cross reconstruction loss, which computes the global similarity between the features of the encoders and the features reconstructed by the decoder, $\mathcal{L}_{\mathit{CE}}$ is the cross-entropy between the predicted classes and the true labels, and $\mathcal{L}_{\mathit{KL}}$ is the Kullback–Leibler divergence between the two predicted distributions. The trainable components of our architecture are depicted in green, whereas the frozen components are represented in blue. (a) illustrates the training phase, where both the encoders and the decoder are trainable. At test-time training (b), the decoder is frozen. Finally, (c) shows the inference time when the entire network is frozen; modules represented in gray are not needed in this phase.
  • Figure 2: Quantitative results, compared to the state-of-the-art, on the CIFAR TinyImageNet-C and VisDA datasets (%). A detailed report for CIFAR-100, TinyImageNet and VisDA is provided in supplementary material
  • Figure 3: t-SNE plot of features after different adaptation iterations (0, 10, 20, 50) for the Brightness (top row) and Contrast (bottom row) corruptions of CIFAR-10C. The adaptation at test time helps separate the features of examples from the same class (represented by color).
  • Figure 4: How many iterations are needed for adaptation? Performance (AUROC) obtained by our method with different number of adaptation iterations on CIFAR-10C. For most corruption types, our method provides a significant boost within few iterations and remains stable when the number of iterations is increased.
  • Figure 5: Performance (AUROC) reached by our method with different numbers of adaptation iterations on CIFAR-100C.