Table of Contents
Fetching ...

Test Time Adaptation Using Adaptive Quantile Recalibration

Paria Mehrbod, Pedro Vianna, Geraldin Nanfack, Guy Wolf, Eugene Belilovsky

TL;DR

The paper addresses domain shift in deep learning by proposing Adaptive Quantile Recalibration (AQR), a test-time adaptation method that aligns pre-activation distributions through channel-wise, nonparametric quantile transforms based on source statistics. AQR is architecture-agnostic, extending beyond BatchNorm to GroupNorm and LayerNorm, and includes a robust tail-calibration strategy to handle varying batch sizes. The authors prove in a simplified one-hidden-layer model that AQR can perfectly recover the source hidden representations $h^S$ (i.e., $\text{MSE}(T^{\mathrm{AQR}})=0$) while TTN incurs nonzero bias when nonlinear corruptions are present. Empirically, AQR consistently outperforms state-of-the-art TTA methods on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures, with notable gains at higher corruption severities and larger batch sizes, highlighting its practical potential for real-world deployment under dynamic data distributions. The work provides a solid theoretical and empirical foundation for quantile-based, normalization-agnostic test-time adaptation and outlines avenues for online extensions and hybrid approaches.

Abstract

Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios, where test distributions often diverge significantly from the training domain. However, conventional approaches typically rely on prior knowledge of the target domain or require model retraining, limiting their practicality in dynamic or resource-constrained environments. Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation, but they often fail to capture complex activation distributions and are constrained to specific normalization layers. We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis. AQR captures the full shape of activation distributions and generalizes across architectures employing BatchNorm, GroupNorm, or LayerNorm. To address the challenge of estimating distribution tails under varying batch sizes, AQR incorporates a robust tail calibration strategy that improves stability and precision. Our method leverages source-domain statistics computed at training time, enabling unsupervised adaptation without retraining models. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures demonstrate that AQR achieves robust adaptation across diverse settings, outperforming existing test-time adaptation baselines. These results highlight AQR's potential for deployment in real-world scenarios with dynamic and unpredictable data distributions.

Test Time Adaptation Using Adaptive Quantile Recalibration

TL;DR

The paper addresses domain shift in deep learning by proposing Adaptive Quantile Recalibration (AQR), a test-time adaptation method that aligns pre-activation distributions through channel-wise, nonparametric quantile transforms based on source statistics. AQR is architecture-agnostic, extending beyond BatchNorm to GroupNorm and LayerNorm, and includes a robust tail-calibration strategy to handle varying batch sizes. The authors prove in a simplified one-hidden-layer model that AQR can perfectly recover the source hidden representations (i.e., ) while TTN incurs nonzero bias when nonlinear corruptions are present. Empirically, AQR consistently outperforms state-of-the-art TTA methods on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures, with notable gains at higher corruption severities and larger batch sizes, highlighting its practical potential for real-world deployment under dynamic data distributions. The work provides a solid theoretical and empirical foundation for quantile-based, normalization-agnostic test-time adaptation and outlines avenues for online extensions and hybrid approaches.

Abstract

Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios, where test distributions often diverge significantly from the training domain. However, conventional approaches typically rely on prior knowledge of the target domain or require model retraining, limiting their practicality in dynamic or resource-constrained environments. Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation, but they often fail to capture complex activation distributions and are constrained to specific normalization layers. We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis. AQR captures the full shape of activation distributions and generalizes across architectures employing BatchNorm, GroupNorm, or LayerNorm. To address the challenge of estimating distribution tails under varying batch sizes, AQR incorporates a robust tail calibration strategy that improves stability and precision. Our method leverages source-domain statistics computed at training time, enabling unsupervised adaptation without retraining models. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures demonstrate that AQR achieves robust adaptation across diverse settings, outperforming existing test-time adaptation baselines. These results highlight AQR's potential for deployment in real-world scenarios with dynamic and unpredictable data distributions.

Paper Structure

This paper contains 34 sections, 6 theorems, 36 equations, 6 figures, 9 tables.

Key Result

Theorem 1

Under the regularity conditions eq:density-bounds-eq:smoothness, for any neuron $i$ and any $\delta \in (0,1)$, with probability at least $1-2\delta$: where $\varepsilon_\bullet(\delta, n) := \sqrt{\frac{1}{2n}\log\frac{2}{\delta}}$. In particular, $\text{MSE}_i(T_{i,K,n}^{\text{AQR}}) \to 0$ as $K, n_S, n_T \to \infty$ at rates $O(K^{-4})$, $O(n_S^{-1})$, and $O(n_T^{-1})$.

Figures (6)

  • Figure 1: Comparing AQR and TTN in preserving complex distribution shapes at test-time using synthetic data.
  • Figure 2: Overview of the Adaptive Quantile Recalibration (AQR) method. During the setup phase (top), source data is processed to extract pre-activations and compute percentiles per channel, which are saved as reference statistics. During inference (bottom), target data pre-activations are similarly processed, and AQR transforms target percentiles to match source percentiles using piecewise linear transformation, enabling distribution alignment without architectural constraints.
  • Figure 3: Distribution of deviations between small-batch (128) and reference (10,000) percentiles across 20 trials.
  • Figure 4: Performance comparison across corruption severity levels. AQR consistently outperforms baseline methods on all datasets, with larger performance gains at higher severities. Results averaged across all corruption types, batch sizes, and architectures. Error bars represent the standard error of the mean for different experimental conditions.
  • Figure 5: Architecture-specific performance comparison at corruption severity level 3. AQR demonstrates consistent improvements across diverse architectures on three datasets (CIFAR-10-C, CIFAR-100-C, ImageNet-C), including ResNets with different normalization schemes (BN, GN) and ViTs (LN). Error bars represent standard error across corruption types and batch sizes
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 1: Finite-Sample AQR Error Bound
  • Lemma 1: Error Decomposition
  • proof
  • Lemma 2: Concentration via Dvoretzky-Kiefer-Wolfowitz Inequality
  • Lemma 3: CDF-to-Quantile Error Transfer
  • Lemma 4: Finite-Quantile Discretization Error
  • Lemma 5: Knot Stability