Table of Contents
Fetching ...

Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation

Devavrat Tomar, Guillaume Vray, Jean-Philippe Thiran, Behzad Bozorgtabar

TL;DR

UnMix-TNS tackles the bias in batch normalization statistics during test-time adaptation when test streams exhibit label temporal correlation (non-i.i.d.). It decomposes stored BN statistics into $K$ components and performs online unmixing, aligning per-instance statistics with a mixture of component statistics to restore i.i.d.-like behavior. The method integrates seamlessly with existing TTA approaches and BN-equipped architectures, delivering robust improvements across corruption and natural-shift benchmarks, including single, continual, and mixed domain adaptation, with minimal overhead. This yields more stable and accurate predictions in practical non-i.i.d. streaming settings and is supported by extensive ablations and diverse datasets, with code publicly available.

Abstract

Recent test-time adaptation methods heavily rely on nuanced adjustments of batch normalization (BN) parameters. However, one critical assumption often goes overlooked: that of independently and identically distributed (i.i.d.) test batches with respect to unknown labels. This oversight leads to skewed BN statistics and undermines the reliability of the model under non-i.i.d. scenarios. To tackle this challenge, this paper presents a novel method termed 'Un-Mixing Test-Time Normalization Statistics' (UnMix-TNS). Our method re-calibrates the statistics for each instance within a test batch by mixing it with multiple distinct statistics components, thus inherently simulating the i.i.d. scenario. The core of this method hinges on a distinctive online unmixing procedure that continuously updates these statistics components by incorporating the most similar instances from new test batches. Remarkably generic in its design, UnMix-TNS seamlessly integrates with a wide range of leading test-time adaptation methods and pre-trained architectures equipped with BN layers. Empirical evaluations corroborate the robustness of UnMix-TNS under varied scenarios-ranging from single to continual and mixed domain shifts, particularly excelling with temporally correlated test data and corrupted non-i.i.d. real-world streams. This adaptability is maintained even with very small batch sizes or single instances. Our results highlight UnMix-TNS's capacity to markedly enhance stability and performance across various benchmarks. Our code is publicly available at https://github.com/devavratTomar/unmixtns.

Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation

TL;DR

UnMix-TNS tackles the bias in batch normalization statistics during test-time adaptation when test streams exhibit label temporal correlation (non-i.i.d.). It decomposes stored BN statistics into components and performs online unmixing, aligning per-instance statistics with a mixture of component statistics to restore i.i.d.-like behavior. The method integrates seamlessly with existing TTA approaches and BN-equipped architectures, delivering robust improvements across corruption and natural-shift benchmarks, including single, continual, and mixed domain adaptation, with minimal overhead. This yields more stable and accurate predictions in practical non-i.i.d. streaming settings and is supported by extensive ablations and diverse datasets, with code publicly available.

Abstract

Recent test-time adaptation methods heavily rely on nuanced adjustments of batch normalization (BN) parameters. However, one critical assumption often goes overlooked: that of independently and identically distributed (i.i.d.) test batches with respect to unknown labels. This oversight leads to skewed BN statistics and undermines the reliability of the model under non-i.i.d. scenarios. To tackle this challenge, this paper presents a novel method termed 'Un-Mixing Test-Time Normalization Statistics' (UnMix-TNS). Our method re-calibrates the statistics for each instance within a test batch by mixing it with multiple distinct statistics components, thus inherently simulating the i.i.d. scenario. The core of this method hinges on a distinctive online unmixing procedure that continuously updates these statistics components by incorporating the most similar instances from new test batches. Remarkably generic in its design, UnMix-TNS seamlessly integrates with a wide range of leading test-time adaptation methods and pre-trained architectures equipped with BN layers. Empirical evaluations corroborate the robustness of UnMix-TNS under varied scenarios-ranging from single to continual and mixed domain shifts, particularly excelling with temporally correlated test data and corrupted non-i.i.d. real-world streams. This adaptability is maintained even with very small batch sizes or single instances. Our results highlight UnMix-TNS's capacity to markedly enhance stability and performance across various benchmarks. Our code is publicly available at https://github.com/devavratTomar/unmixtns.
Paper Structure (32 sections, 23 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 23 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: Test-Time BN (TBN) vs. UnMix-TNS. (a) TBN recalibrates its intermediate features when test batches are i.i.d. sampled over time $t$, accommodating distribution shifts. (b) However, TBN fails for non-i.i.d. label-based test batch sampling, leading to skewed batch statistics. (c) UnMix-TNS overcomes this failure by estimating unbiased batch statistics through its $K$ statistics components.
  • Figure 2: An Overview of UnMix-TNS. Given a batch of non-i.i.d test features $\mathbf{z^t}\in \mathbb{R}^{B\times C \times L}$ at a temporal instance $t$, we mix the instance-wise statistics $(\Tilde{\mu}^t, \Tilde{\sigma}^t) \in \mathbb{R}^{B \times C}$ with $K$ UnMix-TNS components. The alignment of each sample in the batch with the UnMix-TNS components is quantified through similarity-derived assignment probabilities $p_k^t$. This aids both the mixing process and subsequent component updates for time $t+1$.
  • Figure 3: Ablation study on the impact of (a) Dirichlet parameter, $\delta$, and (b) batch size on CIFAR100-C, comparing several test-time normalization methods including TBN, $\alpha$-BN, RBN, and UnMix-TNS.
  • Figure 4: Exploration of UnMix-TNS influence at varied depths within the neural network. (a) Represents the average classification error rate when only the BN layers subsequent to the layer index are replaced by UnMix-TNS layers. (b) Shows the average classification error rate when solely the BN layers preceding the layer index are exchanged by UnMix-TNS layers. A layer index of 0 corresponds to the first layer. The depicted experiments focus on non-i.i.d. continual test-time domain adaptation on the CIFAR10-C.
  • Figure 5: Ablation study on the impact of the (a) concentration parameter $\delta$, and (b) batch size on CIFAR10-C for several test-time normalization methods including TBN, $\alpha$-BN, RBN, and our proposed UnMix-TNS.
  • ...and 3 more figures