Table of Contents
Fetching ...

Panda: Test-Time Adaptation with Negative Data Augmentation

Ruxi Deng, Wenxuan Bao, Tianxin Wei, Jingrui He

TL;DR

Panda tackles the vulnerability of pretrained vision-language models to common image corruptions by addressing distribution shift-induced prediction bias. It introduces negative data augmentation (NDA) that constructs batch-shared, patch-based negatives and offsets original image embeddings with the mean NDA embedding to suppress corruption cues while preserving class information. The method is designed to plug into existing test-time adaptation frameworks, achieving substantial robustness gains with minimal computational overhead. Empirical results across CIFAR-10-C, CIFAR-100-C, and ImageNet-C show Panda consistently improves a wide range of TTA baselines and surpasses PDA-based approaches in both performance and efficiency, with reduced bias across corruption types.

Abstract

Pretrained VLMs exhibit strong zero-shot classification capabilities, but their predictions degrade significantly under common image corruptions. To improve robustness, many test-time adaptation (TTA) methods adopt positive data augmentation (PDA), which generates multiple views of each test sample to reduce prediction variance. However, these methods suffer from two key limitations. First, it introduces considerable computational overhead due to the large number of augmentations required per image. Second, it fails to mitigate prediction bias, where the model tends to predict certain classes disproportionately under corruption, as PDA operates on corrupted inputs and typically does not remove the corruption itself. To address these challenges, we propose Panda, a novel TTA method based on negative data augmentation (NDA). Unlike positive augmentations that preserve object semantics, Panda generates negative augmentations by disrupting semantic content. It divides images into patches and randomly assembles them from a shared patch pool. These negatively augmented images retain corruption-specific features while discarding object-relevant signals. We then subtract the mean feature of these negative samples from the original image feature, effectively suppressing corruption-related components while preserving class-relevant information. This mitigates prediction bias under distribution shifts. Panda allows augmentation to be shared across samples within a batch, resulting in minimal computational overhead. Panda can be seamlessly integrated into existing test-time adaptation frameworks and substantially improve their robustness. Our experiments indicate that Panda delivers superior performance compared to PDA methods, and a wide range of TTA methods exhibit significantly enhanced performance when integrated with Panda. Our code is available at https://github.com/ruxideng/Panda .

Panda: Test-Time Adaptation with Negative Data Augmentation

TL;DR

Panda tackles the vulnerability of pretrained vision-language models to common image corruptions by addressing distribution shift-induced prediction bias. It introduces negative data augmentation (NDA) that constructs batch-shared, patch-based negatives and offsets original image embeddings with the mean NDA embedding to suppress corruption cues while preserving class information. The method is designed to plug into existing test-time adaptation frameworks, achieving substantial robustness gains with minimal computational overhead. Empirical results across CIFAR-10-C, CIFAR-100-C, and ImageNet-C show Panda consistently improves a wide range of TTA baselines and surpasses PDA-based approaches in both performance and efficiency, with reduced bias across corruption types.

Abstract

Pretrained VLMs exhibit strong zero-shot classification capabilities, but their predictions degrade significantly under common image corruptions. To improve robustness, many test-time adaptation (TTA) methods adopt positive data augmentation (PDA), which generates multiple views of each test sample to reduce prediction variance. However, these methods suffer from two key limitations. First, it introduces considerable computational overhead due to the large number of augmentations required per image. Second, it fails to mitigate prediction bias, where the model tends to predict certain classes disproportionately under corruption, as PDA operates on corrupted inputs and typically does not remove the corruption itself. To address these challenges, we propose Panda, a novel TTA method based on negative data augmentation (NDA). Unlike positive augmentations that preserve object semantics, Panda generates negative augmentations by disrupting semantic content. It divides images into patches and randomly assembles them from a shared patch pool. These negatively augmented images retain corruption-specific features while discarding object-relevant signals. We then subtract the mean feature of these negative samples from the original image feature, effectively suppressing corruption-related components while preserving class-relevant information. This mitigates prediction bias under distribution shifts. Panda allows augmentation to be shared across samples within a batch, resulting in minimal computational overhead. Panda can be seamlessly integrated into existing test-time adaptation frameworks and substantially improve their robustness. Our experiments indicate that Panda delivers superior performance compared to PDA methods, and a wide range of TTA methods exhibit significantly enhanced performance when integrated with Panda. Our code is available at https://github.com/ruxideng/Panda .

Paper Structure

This paper contains 33 sections, 3 theorems, 26 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Theorem 4.1

Consider a binary classification problem where the input feature $v$ can be decomposed into two independent components: $v = v_{\text{cls}} + v_{\text{corr}}$, where $v_{\text{cls}} \sim {\mathcal{N}}(0, 1)$ denotes the class-relevant component and $v_{\text{corr}} \sim {\mathcal{N}}(0, s^2)$ denote Now consider a negatively augmented feature $n \sim {\mathcal{N}}(0, s^2)$ such that the correlatio

Figures (10)

  • Figure 1: Comparison of positive (left) and negative (right) data augmentation. Left: PDA used in previous TTA algorithms generates $K$ class-preserving views per image, resulting in high computational cost. Right: NDA used in Panda generates $M$ class-agnostic corrupted views shared across a batch of $B$ images, incurring minimal overhead.
  • Figure 2: Distribution distance between ground-truth and soft prediction distributions under four corruption categories. Original denotes the uncorrupted CIFAR-10 dataset. Larger distribution distance indicates greater prediction bias. Corruptions introduce significant bias that positive data augmentation often fails to mitigate. In contrast, Panda effectively reduces this bias. See Figure \ref{['fig-l1-all']} in Appendix \ref{['appendix-a-analysis']} for results on all 15 corruption types.
  • Figure 3: Overview of Panda. Given a batch of $B$ original images, $M$ negatively augmented images are generated by cutting the originals into patches, shuffling, and recomposing. Both original and negative augmented images are encoded by the image encoder. The average of the negative embeddings serves as a corruption prototype and is subtracted from the original embeddings to suppress corruption-related features. Final predictions are obtained by comparing the debiased features with text embeddings.
  • Figure 4: Prediction bias and accuracy (%) measured across the test stream, divided into 10 consecutive chunks (each of 1,000 samples).
  • Figure 5: Sensitivity analysis of Panda. Left: accuracy under different ratios of $M:B$, where $M$ is the number of negative augmentations per batch and $B$ is the batch size. Middle: accuracy across a range of offset ratios $\beta$ used in patch translation. Right: accuracy under varying batch sizes, comparing Tent and Tent+Panda.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 4.1: Offsetting leads to accuracy gain
  • Theorem B.1: Offsetting leads to accuracy gain
  • proof
  • Corollary B.1: Generalization to higher dimension
  • proof
  • Remark B.2