Table of Contents
Fetching ...

AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation

Damian Sójka, Sebastian Cygert, Bartłomiej Twardowski, Tomasz Trzciński

TL;DR

Well-established self-training framework is enhanced by incorporating a small memory buffer to increase model stability and at the same time perform dynamic adaptation based on the intensity of domain shift, which outperforms existing approaches on both synthetic and more real-world benchmarks and shows robustness across a variety of TTA scenarios.

Abstract

Test-time adaptation is a promising research direction that allows the source model to adapt itself to changes in data distribution without any supervision. Yet, current methods are usually evaluated on benchmarks that are only a simplification of real-world scenarios. Hence, we propose to validate test-time adaptation methods using the recently introduced datasets for autonomous driving, namely CLAD-C and SHIFT. We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift, often resulting in degraded performance that falls below that of the source model. We noticed that the root of the problem lies in the inability to preserve the knowledge of the source model and adapt to dynamically changing, temporally correlated data streams. Therefore, we enhance the well-established self-training framework by incorporating a small memory buffer to increase model stability and at the same time perform dynamic adaptation based on the intensity of domain shift. The proposed method, named AR-TTA, outperforms existing approaches on both synthetic and more real-world benchmarks and shows robustness across a variety of TTA scenarios. The code is available at https://github.com/dmn-sjk/AR-TTA.

AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation

TL;DR

Well-established self-training framework is enhanced by incorporating a small memory buffer to increase model stability and at the same time perform dynamic adaptation based on the intensity of domain shift, which outperforms existing approaches on both synthetic and more real-world benchmarks and shows robustness across a variety of TTA scenarios.

Abstract

Test-time adaptation is a promising research direction that allows the source model to adapt itself to changes in data distribution without any supervision. Yet, current methods are usually evaluated on benchmarks that are only a simplification of real-world scenarios. Hence, we propose to validate test-time adaptation methods using the recently introduced datasets for autonomous driving, namely CLAD-C and SHIFT. We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift, often resulting in degraded performance that falls below that of the source model. We noticed that the root of the problem lies in the inability to preserve the knowledge of the source model and adapt to dynamically changing, temporally correlated data streams. Therefore, we enhance the well-established self-training framework by incorporating a small memory buffer to increase model stability and at the same time perform dynamic adaptation based on the intensity of domain shift. The proposed method, named AR-TTA, outperforms existing approaches on both synthetic and more real-world benchmarks and shows robustness across a variety of TTA scenarios. The code is available at https://github.com/dmn-sjk/AR-TTA.
Paper Structure (30 sections, 8 equations, 11 figures, 11 tables)

This paper contains 30 sections, 8 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Continual test-time adaptation methods evaluated on artificial (CIFAR10C, ImageNet-C hendrycks2019robustness) and natural (CIFAR10.1 cifar10_1, SHIFT shift2022, CLAD-C verwimp2023clad) domain shifts. Our method is the only one that consistently allows to improve over the naive strategy of using the (frozen) source model.
  • Figure 2: Our method, AR-TTA, utilizes a replay strategy and the mean teacher framework. Each image is paired with an exemplar sampled from memory and image pairs are mixed up. Similarly, pseudo-labels from the teacher model are mixed up with the labels of sampled exemplars. The student model is updated based on cross-entropy loss between its predictions on augmented samples and augmented pseudo-labels. The teacher model is adapted based on an exponential moving average of student's weights. Predictions for each image are taken from the teacher model.
  • Figure 3: Batch-wise classification accuracy (%) averaged in a window of 400 batches on CLAD-C benchmark for the chosen methods continually adapted to the sequence of data, with major ticks on the x-axis symbolizing the beginning of a different domain and minor ticks indicating image number. Best viewed in color.
  • Figure A.1: The influence of replay memory size on the resulting accuracy on CIFAR10C and CLAD-C benchmarks.
  • Figure A.2: The relationship between mean classification accuracy (%) and the value of parameter $\gamma$ for CIFAR10C and CLAD-C benchmarks.
  • ...and 6 more figures