Table of Contents
Fetching ...

Soft Augmentation for Image Classification

Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

TL;DR

Soft Augmentation generalizes data augmentation by softening learning targets in a non-linear way as a function of transform degree, inspired by human vision under occlusion. Using a visibility-based curve $p = 1 - \alpha(\phi) = 1 - (1 - p_{\min})(1 - v_{\phi})^{k}$, it enables larger augmentation ranges and can be deployed as Target, Weight, or Target & Weight variants. Empirically, it yields consistent top-1 improvements across CIFAR and ImageNet, enhances occlusion robustness up to $4\times$, and halves expected calibration error, while also improving self-supervised learning (SimSiam). The approach often outperforms or complements strong augmentation policies like RandAugment and TrivialAugment, with practical benefits for calibration and robustness in real-world vision tasks. Overall, Soft Augmentation provides a principled framework for modeling information loss during augmentation and offers wide applicability beyond supervised classification.

Abstract

Modern neural networks are over-parameterized and thus rely on strong regularization such as data augmentation and weight decay to reduce overfitting and improve generalization. The dominant form of data augmentation applies invariant transforms, where the learning target of a sample is invariant to the transform applied to that sample. We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e.g., more aggressive image crop augmentations produce less confident learning targets. We demonstrate that soft targets allow for more aggressive data augmentation, offer more robust performance boosts, work with other augmentation policies, and interestingly, produce better calibrated models (since they are trained to be less confident on aggressively cropped/occluded examples). Combined with existing aggressive augmentation strategies, soft target 1) doubles the top-1 accuracy boost across Cifar-10, Cifar-100, ImageNet-1K, and ImageNet-V2, 2) improves model occlusion performance by up to $4\times$, and 3) halves the expected calibration error (ECE). Finally, we show that soft augmentation generalizes to self-supervised classification tasks. Code available at https://github.com/youngleox/soft_augmentation

Soft Augmentation for Image Classification

TL;DR

Soft Augmentation generalizes data augmentation by softening learning targets in a non-linear way as a function of transform degree, inspired by human vision under occlusion. Using a visibility-based curve , it enables larger augmentation ranges and can be deployed as Target, Weight, or Target & Weight variants. Empirically, it yields consistent top-1 improvements across CIFAR and ImageNet, enhances occlusion robustness up to , and halves expected calibration error, while also improving self-supervised learning (SimSiam). The approach often outperforms or complements strong augmentation policies like RandAugment and TrivialAugment, with practical benefits for calibration and robustness in real-world vision tasks. Overall, Soft Augmentation provides a principled framework for modeling information loss during augmentation and offers wide applicability beyond supervised classification.

Abstract

Modern neural networks are over-parameterized and thus rely on strong regularization such as data augmentation and weight decay to reduce overfitting and improve generalization. The dominant form of data augmentation applies invariant transforms, where the learning target of a sample is invariant to the transform applied to that sample. We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e.g., more aggressive image crop augmentations produce less confident learning targets. We demonstrate that soft targets allow for more aggressive data augmentation, offer more robust performance boosts, work with other augmentation policies, and interestingly, produce better calibrated models (since they are trained to be less confident on aggressively cropped/occluded examples). Combined with existing aggressive augmentation strategies, soft target 1) doubles the top-1 accuracy boost across Cifar-10, Cifar-100, ImageNet-1K, and ImageNet-V2, 2) improves model occlusion performance by up to , and 3) halves the expected calibration error (ECE). Finally, we show that soft augmentation generalizes to self-supervised classification tasks. Code available at https://github.com/youngleox/soft_augmentation
Paper Structure (25 sections, 18 equations, 6 figures, 9 tables)

This paper contains 25 sections, 18 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 2: Variants of Soft Augmentation as prescribed by Equations \ref{['eq:soft_augment']} (Soft Target), \ref{['eq:soft_augment_w']} (Soft Weight), \ref{['eq:soft_augment_tw']} (Soft Target & Weight) with example target confidence $p=0.6$ (left). Soft Augmentation applies non-linear ($k=2,3,4,...)$ softening to learning targets based on the specific degree of occlusion of a cropped image (Equation \ref{['eq:soften_curve']}), which qualitatively captures the degradation of human visual recognition under occlusion tang2018recurrent. Label Smoothing applies a fixed softening factor $\alpha$ to the one-hot classification target.
  • Figure 3: Soft Augmentation reduces the top-1 validation error of ResNet-18 on Cifar-100 by up to $2.5\%$ via combining both target and weight softening (Equation \ref{['eq:soft_augment_tw']}). Applying target softening alone (Equation \ref{['eq:soft_augment']}) can boost performance by $\sim 2\%$. Crop parameters $tx,ty$ are independently drawn from $N(0,\sigma L)$ ($L=32$). Higher error reductions indicate better performance over baseline. All results are the means and standard errors across 3 independent runs.
  • Figure 4: Examples of occluded ImageNet validation images and model predictions of ResNet-50. $224 \times 224$ validation images of ImageNet are occluded with randomly placed square patches that cover $\lambda$ of the image area. $\lambda$ is set to $\{0\%,20\%,40\%,60\%,80\% \}$ to create a range of occlusion levels.
  • Figure 5: Soft Augmentation improves occlusion robustness of ResNet-50 on ImageNet. Both RandAugment (RA) and Soft Augmentation (SA) improve occlusion robustness independently. Combining RA with SA reduces Top-1 error by up to 17%. At 80% occlusion level, compared with baseline accuracy (3.42%), SA+RA achieves more than $\bm{4\times}$accuracy (18.98%).
  • Figure 6: Example images of the Cifar-100 validation set and predictions of WideResNet-28. Predicted classes and confidence levels of models trained with Soft Augmentation + Trivial Augment (SA+TA) and baseline (BL) augmentation are reported. In many cases, SA+TA not only corrects the class prediction, but also improves the model confidence. For instance, BL mistakes "seal" for "beaver" (top-left, both classes belong to the same "aquatic mammal" superclass), and SA+TA makes a correct class prediction with higher confidence.
  • ...and 1 more figures