Understanding the Detrimental Class-level Effects of Data Augmentation
Polina Kirichenko, Mark Ibrahim, Randall Balestriero, Diane Bouchacourt, Ramakrishna Vedantam, Hamed Firooz, Andrew Gordon Wilson
TL;DR
This work investigates how strong data augmentation, notably Random Resized Crop, can degrade per-class accuracy due to interactions between class-conditional distributions. By leveraging ReaL multi-label annotations, the authors show that much of the reported class-level drops are inflated by label ambiguity, while still identifying non-trivial, non-noise-driven confusions (especially among fine-grained classes). They categorize class confusions into ambiguous, co-occurring, fine-grained, and unrelated types and quantify distribution overlaps using ReaL co-occurrence and semantic similarity measures. A simple class-conditional augmentation policy—tuning augmentation strength for a small set of affected classes—substantially improves degraded class performance (≈2.5% on average for the affected set) without sacrificing overall accuracy, with consistent gains across ResNet-50, EfficientNet, and ViT. The findings advocate evaluating beyond average accuracy and adopting targeted augmentation strategies to mitigate DA-induced biases in real-world deployments.
Abstract
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.
