Table of Contents
Fetching ...

Unraveling Batch Normalization for Realistic Test-Time Adaptation

Zixian Su, Jingwei Guo, Kai Yao, Xi Yang, Qiufeng Wang, Kaizhu Huang

TL;DR

The paper investigates why target statistics estimated during test-time degrade under realistic mini-batch conditions and identifies reduced class diversity as the principal cause. It introduces Test-time Exponential Moving Average (TEMA) with adaptive momentum to broaden class information, plus a layer-wise rectification strategy that balances source and target statistics according to inter-domain divergence, all in a training-free framework. Through extensive experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C, the approach achieves state-of-the-art robustness across varying batch sizes and domain shifts, with strong stability in small-batch regimes. The work offers practical, training-free improvements for real-world test-time adaptation and highlights the importance of class diversity in accurate target-statistic estimation.

Abstract

While recent test-time adaptations exhibit efficacy by adjusting batch normalization to narrow domain disparities, their effectiveness diminishes with realistic mini-batches due to inaccurate target estimation. As previous attempts merely introduce source statistics to mitigate this issue, the fundamental problem of inaccurate target estimation still persists, leaving the intrinsic test-time domain shifts unresolved. This paper delves into the problem of mini-batch degradation. By unraveling batch normalization, we discover that the inexact target statistics largely stem from the substantially reduced class diversity in batch. Drawing upon this insight, we introduce a straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge the class diversity gap between training and testing batches. Importantly, our TEMA adaptively extends the scope of typical methods beyond the current batch to incorporate a diverse set of class information, which in turn boosts an accurate target estimation. Built upon this foundation, we further design a novel layer-wise rectification strategy to consistently promote test-time performance. Our proposed method enjoys a unique advantage as it requires neither training nor tuning parameters, offering a truly hassle-free solution. It significantly enhances model robustness against shifted domains and maintains resilience in diverse real-world scenarios with various batch sizes, achieving state-of-the-art performance on several major benchmarks. Code is available at \url{https://github.com/kiwi12138/RealisticTTA}.

Unraveling Batch Normalization for Realistic Test-Time Adaptation

TL;DR

The paper investigates why target statistics estimated during test-time degrade under realistic mini-batch conditions and identifies reduced class diversity as the principal cause. It introduces Test-time Exponential Moving Average (TEMA) with adaptive momentum to broaden class information, plus a layer-wise rectification strategy that balances source and target statistics according to inter-domain divergence, all in a training-free framework. Through extensive experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C, the approach achieves state-of-the-art robustness across varying batch sizes and domain shifts, with strong stability in small-batch regimes. The work offers practical, training-free improvements for real-world test-time adaptation and highlights the importance of class diversity in accurate target-statistic estimation.

Abstract

While recent test-time adaptations exhibit efficacy by adjusting batch normalization to narrow domain disparities, their effectiveness diminishes with realistic mini-batches due to inaccurate target estimation. As previous attempts merely introduce source statistics to mitigate this issue, the fundamental problem of inaccurate target estimation still persists, leaving the intrinsic test-time domain shifts unresolved. This paper delves into the problem of mini-batch degradation. By unraveling batch normalization, we discover that the inexact target statistics largely stem from the substantially reduced class diversity in batch. Drawing upon this insight, we introduce a straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge the class diversity gap between training and testing batches. Importantly, our TEMA adaptively extends the scope of typical methods beyond the current batch to incorporate a diverse set of class information, which in turn boosts an accurate target estimation. Built upon this foundation, we further design a novel layer-wise rectification strategy to consistently promote test-time performance. Our proposed method enjoys a unique advantage as it requires neither training nor tuning parameters, offering a truly hassle-free solution. It significantly enhances model robustness against shifted domains and maintains resilience in diverse real-world scenarios with various batch sizes, achieving state-of-the-art performance on several major benchmarks. Code is available at \url{https://github.com/kiwi12138/RealisticTTA}.
Paper Structure (26 sections, 2 theorems, 6 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 2 theorems, 6 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Given an infinite sample space where each sample is independently and identically distributed (i.i.d) with an equal probability of selection for each category. Let $M$ denote the number of distinct categories contained within a given batch, and $K$ be the category number in total. For a batch of siz where $\mathbf{C}$ denotes the combination symbol in Combinatorics.

Figures (4)

  • Figure 1: (a)(b) Running mean statistics of one specific channel in the last BN layer during inference. (c) Performance comparison between TENT wang2021tent and Ours under different batch sizes. (d) Quantified class diversity of TENT v.s. Ours under different batch sizes. Same color denotes same evaluation setting. As can be seen, class diversity plays a vital role in test-time performance.
  • Figure 2: Momentum analysis for TEMA on three benchmarks under continual setting with different test batch size. Red, blue and grey regions represent the calculated part where momentum should be set to $m=1.0,0.1,0.01$ according to Eq. (\ref{['objective']}). Lines plot the experimental performance of TEMA(m=1.0)/TBN, TEMA(m=0.1), and TEMA(m=0.01).
  • Figure 3: Flowchart of Layer-wise Rectification Strategy.
  • Figure 4: Real-time performance on CIFAR-10-C (Gaussian noise). Error rate remains stable for steps after.

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • proof
  • proof