Table of Contents
Fetching ...

A data-centric approach to class-specific bias in image data augmentation

Athanasios Angelakis, Andrey Rass

TL;DR

This paper investigates how data augmentation can induce class-specific bias that depends on dataset characteristics and model architecture. It evaluates Random Crop and Random Horizontal Flip across Fashion-MNIST, CIFAR-10, and CIFAR-100 using ResNet50, EfficientNetV2S, and SWIN Transformer, and introduces a Data Augmentation Robustness Scouting protocol that probes augmentation intensity $\alpha$ to quantify per-class and overall performance dynamics. The results reveal dataset- and architecture-dependent bias, with Vision Transformers showing delayed or altered bias dynamics relative to residual CNNs, and demonstrate a substantial reduction in computational cost (training 112 models vs 1860) while preserving bias-trend capture. These insights guide practical bias mitigation and model selection for deployments with aggressive data augmentation, and point to future work expanding architectures and datasets to further understand DA-induced biases.

Abstract

Data augmentation (DA) enhances model generalization in computer vision but may introduce biases, impacting class accuracy unevenly. Our study extends this inquiry, examining DA's class-specific bias across various datasets, including those distinct from ImageNet, through random cropping. We evaluated this phenomenon with ResNet50, EfficientNetV2S, and SWIN ViT, discovering that while residual models showed similar bias effects, Vision Transformers exhibited greater robustness or altered dynamics. This suggests a nuanced approach to model selection, emphasizing bias mitigation. We also refined a "data augmentation robustness scouting" method to manage DA-induced biases more efficiently, reducing computational demands significantly (training 112 models instead of 1860; a reduction of factor 16.2) while still capturing essential bias trends.

A data-centric approach to class-specific bias in image data augmentation

TL;DR

This paper investigates how data augmentation can induce class-specific bias that depends on dataset characteristics and model architecture. It evaluates Random Crop and Random Horizontal Flip across Fashion-MNIST, CIFAR-10, and CIFAR-100 using ResNet50, EfficientNetV2S, and SWIN Transformer, and introduces a Data Augmentation Robustness Scouting protocol that probes augmentation intensity to quantify per-class and overall performance dynamics. The results reveal dataset- and architecture-dependent bias, with Vision Transformers showing delayed or altered bias dynamics relative to residual CNNs, and demonstrate a substantial reduction in computational cost (training 112 models vs 1860) while preserving bias-trend capture. These insights guide practical bias mitigation and model selection for deployments with aggressive data augmentation, and point to future work expanding architectures and datasets to further understand DA-induced biases.

Abstract

Data augmentation (DA) enhances model generalization in computer vision but may introduce biases, impacting class accuracy unevenly. Our study extends this inquiry, examining DA's class-specific bias across various datasets, including those distinct from ImageNet, through random cropping. We evaluated this phenomenon with ResNet50, EfficientNetV2S, and SWIN ViT, discovering that while residual models showed similar bias effects, Vision Transformers exhibited greater robustness or altered dynamics. This suggests a nuanced approach to model selection, emphasizing bias mitigation. We also refined a "data augmentation robustness scouting" method to manage DA-induced biases more efficiently, reducing computational demands significantly (training 112 models instead of 1860; a reduction of factor 16.2) while still capturing essential bias trends.
Paper Structure (7 sections, 1 equation, 21 figures)

This paper contains 7 sections, 1 equation, 21 figures.

Figures (21)

  • Figure 1: This figure shows a collection of four 3x3 grids, each demonstrating the effect of a Random Crop + Random Horizontal Flip combined data augmentation on an image belonging to the "horse" class of the CIFAR-10 dataset. The Random Crop $\alpha$ for is 0%, 30%, 60% and 90% for each grid, respectively. Label loss progression can be intuited to a degree, as the proportion of images that lack the distinctive features of the "horse" class increases with $\alpha$.
  • Figure 1: The results in all figures employ official ResNet50 models from Tensorflow trained from scratch on the CIFAR-100 dataset with random crop data augmentation applied. All results in this figure are averaged over 4 runs. During training, the proportion of the original image obscured by the augmentation varies from 100% to 10%. We observe The vertical dotted lines denote the best test accuracy for every class.
  • Figure 2: The results in this figure employ official ResNet50 models from Tensorflow trained from scratch on the Fashion-MNIST, CIFAR-10 & CIFAR-100 datasets respectively, with the Random Crop and Random Horizontal Flip DA applied. All results in this figure are averaged over 4 runs. During training, the proportion of the original image obscured by the augmentation varies from 100% to 10%. We observe that, in line with expectations, training the same architecture on with varying random crop percentage can provide greater average test accuracy (blue) up to a certain point, but does eventually drop due to the acute negative impact some per-class accuracies experience past given crop $\alpha$ values. The vertical dotted lines denote the best test accuracy for every class. Only a subset of classes is shown for CIFAR-100 for legibility purposes.
  • Figure 2: The results in all figures employ official ResNet50 models from Tensorflow trained from scratch on the CIFAR-100 dataset with random crop and random horizontal flip data augmentations applied. All results in this figure are averaged over 4 runs. During training, the proportion of the original image obscured by the augmentation varies from 100% to 10%. We observe The vertical dotted lines denote the best test accuracy for every class.
  • Figure 3: The results in this figure employ official ResNet50 models from Tensorflow trained from scratch on the Fashion-MNIST, CIFAR-10 & CIFAR-100 datasets respectively, with the Random Crop but no Random Horizontal Flip DA applied. All results in this figure are averaged over 4 runs. During training, the proportion of the original image obscured by the augmentation varies from 100% to 10%. We observe that while the trends from Figure 2 are generally maintained, the removal of Random Flip seems to decrease the speed at which class-specific bias manifests as $\alpha$ is increased. Only a subset of classes is shown for CIFAR-100 for legibility purposes.
  • ...and 16 more figures