Table of Contents
Fetching ...

Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization

Yongyi Su, Xun Xu, Kui Jia

TL;DR

The paper tackles real-world test-time adaptation under non-i.i.d. streams with global/local class imbalance and continual domain shifts. It introduces Balanced Batch Normalization to unbiasedly estimate statistics across classes and a Tri-Net Self-Training framework that couples a teacher, student, and anchored (source) network to regularize adaptation via an anchored loss. The combined TRIBE approach achieves state-of-the-art results across multiple TTA benchmarks (GLI-TTA-F/V) and demonstrates robustness to varying imbalance degrees, while maintaining reasonable computation. This work offers practical mechanisms for deploying robust TTA in realistic, evolving environments, with potential impact on safety-critical vision systems and long-tailed data regimes.

Abstract

Test-Time Adaptation aims to adapt source domain model to testing data at inference stage with success demonstrated in adapting to unseen corruptions. However, these attempts may fail under more challenging real-world scenarios. Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift. In this work, we first complement the existing real-world TTA protocol with a globally class imbalanced testing set. We demonstrate that combining all settings together poses new challenges to existing methods. We argue the failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data. To remedy this shortcoming, we propose a balanced batchnorm layer to swap out the regular batchnorm at inference stage. The new batchnorm layer is capable of adapting without biasing towards majority classes. We are further inspired by the success of self-training (ST) in learning from unlabeled data and adapt ST for test-time adaptation. However, ST alone is prone to over adaption which is responsible for the poor performance under continual domain shift. Hence, we propose to improve self-training under continual domain shift by regularizing model updates with an anchored loss. The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers. We evaluate TRIBE on four datasets representing real-world TTA settings. TRIBE consistently achieves the state-of-the-art performance across multiple evaluation protocols. The code is available at https://github.com/Gorilla-Lab-SCUT/TRIBE.

Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization

TL;DR

The paper tackles real-world test-time adaptation under non-i.i.d. streams with global/local class imbalance and continual domain shifts. It introduces Balanced Batch Normalization to unbiasedly estimate statistics across classes and a Tri-Net Self-Training framework that couples a teacher, student, and anchored (source) network to regularize adaptation via an anchored loss. The combined TRIBE approach achieves state-of-the-art results across multiple TTA benchmarks (GLI-TTA-F/V) and demonstrates robustness to varying imbalance degrees, while maintaining reasonable computation. This work offers practical mechanisms for deploying robust TTA in realistic, evolving environments, with potential impact on safety-critical vision systems and long-tailed data regimes.

Abstract

Test-Time Adaptation aims to adapt source domain model to testing data at inference stage with success demonstrated in adapting to unseen corruptions. However, these attempts may fail under more challenging real-world scenarios. Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift. In this work, we first complement the existing real-world TTA protocol with a globally class imbalanced testing set. We demonstrate that combining all settings together poses new challenges to existing methods. We argue the failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data. To remedy this shortcoming, we propose a balanced batchnorm layer to swap out the regular batchnorm at inference stage. The new batchnorm layer is capable of adapting without biasing towards majority classes. We are further inspired by the success of self-training (ST) in learning from unlabeled data and adapt ST for test-time adaptation. However, ST alone is prone to over adaption which is responsible for the poor performance under continual domain shift. Hence, we propose to improve self-training under continual domain shift by regularizing model updates with an anchored loss. The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers. We evaluate TRIBE on four datasets representing real-world TTA settings. TRIBE consistently achieves the state-of-the-art performance across multiple evaluation protocols. The code is available at https://github.com/Gorilla-Lab-SCUT/TRIBE.
Paper Structure (26 sections, 10 equations, 7 figures, 22 tables, 1 algorithm)

This paper contains 26 sections, 10 equations, 7 figures, 22 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of two challenging real-world TTA scenarios. Different colors indicate the proportions of semantic classes, horizontal axis indicates testing data domain (e.g. different corruptions) may shift over time and different imbalance factor ($I.F.$) controls the degree of global imbalance. We expect the testing data stream to exhibit both local and global class imbalance, termed as "class distribution is fixed (GLI-TTA-F)" and this distribution may also evolve over time, termed as "class distribution is varying (GLI-TTA-V)".
  • Figure 2: An illustration of the proposed real-world TTA simulation protocol with a hierarchical probabilistic model. A non-uniform $\alpha$ results in globally imbalanced testing data distribution.
  • Figure 3: Illustration of the proposed method. We replace the Batchnorm layer of the source model with our proposed Balanced Batchnorm for imbalanced testing set. During test time adaptation, we optimize the combination of self-training loss $\mathcal{L}_{st}$ and anchor loss $\mathcal{L}_{anc}$.
  • Figure 4: Performances on each individual domain (corruption) under GLI-TTA-F (I.F.=100) protocols on CIFAR10-C dataset.
  • Figure 5: We evaluate state-of-the-art TTA methods under different learning rates. The learning rates of NOTE fall in [5e-4, 1e-6] and TTAC fall in [5e-5, 1e-7] in order to align the best LR with other methods.
  • ...and 2 more figures