Table of Contents
Fetching ...

Understanding the Detrimental Class-level Effects of Data Augmentation

Polina Kirichenko, Mark Ibrahim, Randall Balestriero, Diane Bouchacourt, Ramakrishna Vedantam, Hamed Firooz, Andrew Gordon Wilson

TL;DR

This work investigates how strong data augmentation, notably Random Resized Crop, can degrade per-class accuracy due to interactions between class-conditional distributions. By leveraging ReaL multi-label annotations, the authors show that much of the reported class-level drops are inflated by label ambiguity, while still identifying non-trivial, non-noise-driven confusions (especially among fine-grained classes). They categorize class confusions into ambiguous, co-occurring, fine-grained, and unrelated types and quantify distribution overlaps using ReaL co-occurrence and semantic similarity measures. A simple class-conditional augmentation policy—tuning augmentation strength for a small set of affected classes—substantially improves degraded class performance (≈2.5% on average for the affected set) without sacrificing overall accuracy, with consistent gains across ResNet-50, EfficientNet, and ViT. The findings advocate evaluating beyond average accuracy and adopting targeted augmentation strategies to mitigate DA-induced biases in real-world deployments.

Abstract

Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.

Understanding the Detrimental Class-level Effects of Data Augmentation

TL;DR

This work investigates how strong data augmentation, notably Random Resized Crop, can degrade per-class accuracy due to interactions between class-conditional distributions. By leveraging ReaL multi-label annotations, the authors show that much of the reported class-level drops are inflated by label ambiguity, while still identifying non-trivial, non-noise-driven confusions (especially among fine-grained classes). They categorize class confusions into ambiguous, co-occurring, fine-grained, and unrelated types and quantify distribution overlaps using ReaL co-occurrence and semantic similarity measures. A simple class-conditional augmentation policy—tuning augmentation strength for a small set of affected classes—substantially improves degraded class performance (≈2.5% on average for the affected set) without sacrificing overall accuracy, with consistent gains across ResNet-50, EfficientNet, and ViT. The findings advocate evaluating beyond average accuracy and adopting targeted augmentation strategies to mitigate DA-induced biases in real-world deployments.

Abstract

Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.
Paper Structure (17 sections, 14 equations, 16 figures, 3 tables)

This paper contains 17 sections, 14 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: We show that the classes negatively affected by data augmentation are often ambiguous, co-occurring or fine-grained categories and analyze how data augmentation exacerbates class confusions.Left: Average accuracy of ResNet-50 on ImageNet against Random Resized Crop (RRC) data augmentation strength: average of all classes (blue), average of the $50$ classes on which stronger RRC hurts accuracy the most (red), and the average of the remaining $950$ classes (green). Yellow line indicates the default RRC setting used in training of most computer vision models. Middle: We systematically categorize the types of class confusions exacerbated by strong data augmentation: while some of them include ambiguous or correlated classes, there is a number of fine-grained and non-trivial confusions. Right: Often the class-level accuracy drops due to overlap with other classes after applying augmentation: e.g. heavily augmented samples from "car" class can look like typical images from "wheel" class. As a result, the model learns to predict "car" on "wheel" images, and the accuracy on the "wheel" class drops. To resolve the negative effect of strong augmentation on classes like "wheel", we should modify augmentation strength of classes like "car".
  • Figure 2: We find that for many classes the negative effects of strong data augmentation are muted if we use high-quality multi-label annotations.Left: Average and per-class accuracy of ResNet-50 trained on ImageNet evaluated with original and ReaL labels as a function of Random Resized Crop augmentation strength ($s=8\%$ corresponds to the strongest and default augmentation). The top row shows the average accuracy of all ImageNet classes, the 50 classes with the highest original accuracy degradation and the remaining 950 classes. The bottom row shows the accuracy of 3 individual classes most significantly affected in original accuracy when using strong augmentation. Right: Distribution of per-class accuracy drops $\Delta a_k$ for original and ReaL labels. The distribution of $\Delta a^{or}_k$ has a heavier tail compared to the one computed with ReaL labels.
  • Figure 3: Types of class confusions affected by data augmentation with varied semantic similarity and data distribution overlap. Each panel shows a pair of confused classes which we categorize into: ambiguous, co-occurring, fine-grained and semantically unrelated, depending on the inherent class overlap and semantic similarity. For each confused class pair, the left subplot corresponds to the class $k$ whose accuracy decreases with strong data augmentation (DA), e.g. "sunglass" on top left panel: the ratio of validation samples from that class which are classified correctly decreases with stronger DA, while the confusion rate with another class $l$ (e.g. class "sunglasses" on top left panel) increases. The right subplot shows the percent of examples from class $l$ that get classified as $k$ or $l$ against DA strength.
  • Figure 4: Per-class class validation accuracies of ResNet-50 trained on ImageNet computed with original and ReaL labels as a function of Random Resized Crop data augmentation scale lower bound $s$. We show the accuracy trends for the classes with the highest difference between the maximum accuracy on that class across augmentation levels $\max_{s} a^{or}_k (s)$ and the accuracy of the model trained with $s=8\%$. On each subplot below the name of the class we show the accuracy drops with respect to original and ReaL labels: $\Delta a^{or}_k$ and $\Delta a^{ReaL}_k$. We report the mean and standard error over $10$ independent runs of the network.
  • Figure 5: Per-class class validation accuracies of ResNet-50 trained on ImageNet computed with original and ReaL labels as a function of Random Resized Crop data augmentation scale lower bound $s$. We show the accuracy trends for the classes with the highest difference between the maximum ReaL accuracy on that class across augmentation levels $\max_{s} a^{ReaL}_k (s)$ and the ReaL accuracy of the model trained with $s=8\%$. On each subplot below the name of the class we show the accuracy drops with respect to original and ReaL labels: $\Delta a^{or}_k$ and $\Delta a^{ReaL}_k$. We report the mean and standard error over $10$ independent runs of the network.
  • ...and 11 more figures