The Effects of Mixed Sample Data Augmentation are Class Dependent

Haeil Lee; Hansang Lee; Junmo Kim

The Effects of Mixed Sample Data Augmentation are Class Dependent

Haeil Lee, Hansang Lee, Junmo Kim

TL;DR

This work shows that Mixed Sample Data Augmentation (MSDA) methods like Mixup, CutMix, and PuzzleMix induce class-dependent effects, improving some classes while degrading others despite overall gains. The authors define class-level metrics, including $R(m)$, $ΔR_{MSDA}(m)$, $N_{DC}$, and $\overline{ΔR_{DC}}$, to quantify degradation and then introduce DropMix, a simple strategy that randomly excludes MSDA samples in a controlled way to blend MSDA with non-MSDA data during training. Across CIFAR-100 and ImageNet, DropMix reduces the number of degraded classes and mitigates average recall degradation, while often improving overall accuracy; results are demonstrated for multiple MSDA methods and network architectures. The work highlights the non-uniform impact of data augmentation on class performance and offers a practical, low-cost mitigation with potential implications for fairness and reliability in AI systems; it also opens avenues for deeper analysis of when and why MSDA harms certain classes and how to tailor augmentation to minimize bias.

Abstract

Mixed Sample Data Augmentation (MSDA) techniques, such as Mixup, CutMix, and PuzzleMix, have been widely acknowledged for enhancing performance in a variety of tasks. A previous study reported the class dependency of traditional data augmentation (DA), where certain classes benefit disproportionately compared to others. This paper reveals a class dependent effect of MSDA, where some classes experience improved performance while others experience degraded performance. This research addresses the issue of class dependency in MSDA and proposes an algorithm to mitigate it. The approach involves training on a mixture of MSDA and non-MSDA data, which not only mitigates the negative impact on the affected classes, but also improves overall accuracy. Furthermore, we provide in-depth analysis and discussion of why MSDA introduced class dependencies and which classes are most likely to have them.

The Effects of Mixed Sample Data Augmentation are Class Dependent

TL;DR

, and

, to quantify degradation and then introduce DropMix, a simple strategy that randomly excludes MSDA samples in a controlled way to blend MSDA with non-MSDA data during training. Across CIFAR-100 and ImageNet, DropMix reduces the number of degraded classes and mitigates average recall degradation, while often improving overall accuracy; results are demonstrated for multiple MSDA methods and network architectures. The work highlights the non-uniform impact of data augmentation on class performance and offers a practical, low-cost mitigation with potential implications for fairness and reliability in AI systems; it also opens avenues for deeper analysis of when and why MSDA harms certain classes and how to tailor augmentation to minimize bias.

Abstract

Paper Structure (18 sections, 4 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 4 equations, 18 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Mixed Sample Data Augmentation (MSDA)
Class Dependency in Deep Learning
Effect of MSDA is Class Dependent
Experiments
Evaluation metrics
Results: MSDA Creates Class Dependency
DropMix: Dropping Random Samples from MSDA Mitigates Class Dependency
Methods
Experimental Results
Discussions
Why does MSDA create class dependency?
DropMix
Open Problems
...and 3 more sections

Figures (18)

Figure 1: Overview. The effect of MSDA is class dependent. Five PreActResNet50 models are trained and averaged on CIFAR-100 dataset with no MSDA, Mixup, and Mixup with the proposed DropMix. The left graph shows that while Mixup improves the average accuracy and recall for the "Lamp" class, but it reduces recall for the "Dolphin" class. This observation led us to address the issue of measuring and reducing class dependency in MSDA effects. Our goal is to minimize the performance degradation in classes such as "Dolphin" even if it sacrifices some improvement in classes like "Lamp." The right graph shows that DropMix method has a slight decline in "Lamp" performance compared to Mixup but also a much smaller decrease in "Dolphin" performance compared to Mixup. Additionally, the DropMix method achieves a better overall average accuracy than Mixup.
Figure 2: Comparison of class recall changes $\Delta R(m)$ between (a) Mixup and (b) Mixup with DropMix on the CIFAR-100 dataset. Blue indicates improved class in which the class recall is improved compared to the vanilla model, while red indicates degraded class in which the class recall is decreased compared to the vanilla model. Classes are arranged in descending order of recall change.
Figure 3: The performances of Mixup (represented in green), CutMix (represented in orange), and PuzzleMix (represented in brown) with DropMix, respectively, are analyzed in terms of class-dependent metrics as a function of the DropMix rate. This analysis was conducted on the WideResNet28-2 trained with CIFAR-100 dataset. In (a), the average accuracy of the DropMix method surpasses that of the respective MSDA methods (where the DropMix rate is 0) for relatively small DropMix rates. In (b), the number of degraded classes $N_{DC}$ is reduced for a small DropMix rate across all MSDA methods. In (c), the average recall change of degraded classes $\overline{\Delta R_{DC}}$ is mitigated across the entire range of DropMix. The shaded region in (a) illustrates the standard deviation among five distinct models, while the shaded region in (b) depicts the standard deviation across degraded classes.
Figure 4: The impact of Mixup strength on the label information. Each line represents a different class as counterparts in Mixup. The value before the class name indicates the class recall change before and after Mixup, observed in our experiment. As the strength of Mixup increases, the label information for the class is progressively lost. This loss occurs at a noticeably faster rate for the degraded class compared to the improved class. These findings were obtained from the validation set of ImageNet using the official PyTorch ResNet50 model, which has been pre-trained on the ImageNet dataset.
Figure 5: Correlation between class confidence and recall. Each dot represents an individual class. This analysis explores the correlation between class confidence and recall, utilizing individual class data from WideResNet28-2 models trained on the CIFAR-100 dataset. In (a), Mixup (represented in green) reduces confidence levels in comparison to the vanilla model (represented in orange). However, the integration of DropMix (represented in brown) offsets this confidence shift and helps underperforming classes. In (b), there is a significant correlation in which classes with severe recall and confidence degradation problems cluster in the lower left quadrant. The results of five models, each trained with different random seeds, were averaged to reach this figure.
...and 13 more figures

The Effects of Mixed Sample Data Augmentation are Class Dependent

TL;DR

Abstract

The Effects of Mixed Sample Data Augmentation are Class Dependent

Authors

TL;DR

Abstract

Table of Contents

Figures (18)