Table of Contents
Fetching ...

Stylized Synthetic Augmentation further improves Corruption Robustness

Georg Siedel, Rojan Regmi, Abhirami Anand, Weijia Shao, Silvia Vock, Andrey Morozov

TL;DR

Conventional robustification relies on rule based augmentations; The authors propose a pipeline that combines synthetic data with neural style transfer to improve corruption robustness. They show that stylized synthetic images, despite higher FID, improve training outcomes and achieve state of the art robustness on CIFAR-10-C, CIFAR-100-C and TinyImageNet-C. The work highlights that FID is not a reliable predictor of augmentation usefulness and discusses practical considerations including hyperparameter tuning and training overhead. Overall, the method demonstrates strong robustness gains across multiple architectures and datasets, with implications for scalable, texture-agnostic learning in the presence of common corruptions.

Abstract

This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of deep vision models to common corruptions. We show that although applying style transfer on synthetic images degrades their quality with respect to the common FID metric, these images are surprisingly beneficial for model training. We conduct a systematic empirical analysis of the effects of both augmentations and their key hyperparameters on the performance of image classifiers. Our results demonstrate that stylization and synthetic data complement each other well and can be combined with popular rule-based data augmentation techniques such as TrivialAugment, while not working with others. Our method achieves state-of-the-art corruption robustness on several small-scale image classification benchmarks, reaching 93.54%, 74.9% and 50.86% robust accuracy on CIFAR-10-C, CIFAR-100-C and TinyImageNet-C, respectively

Stylized Synthetic Augmentation further improves Corruption Robustness

TL;DR

Conventional robustification relies on rule based augmentations; The authors propose a pipeline that combines synthetic data with neural style transfer to improve corruption robustness. They show that stylized synthetic images, despite higher FID, improve training outcomes and achieve state of the art robustness on CIFAR-10-C, CIFAR-100-C and TinyImageNet-C. The work highlights that FID is not a reliable predictor of augmentation usefulness and discusses practical considerations including hyperparameter tuning and training overhead. Overall, the method demonstrates strong robustness gains across multiple architectures and datasets, with implications for scalable, texture-agnostic learning in the presence of common corruptions.

Abstract

This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of deep vision models to common corruptions. We show that although applying style transfer on synthetic images degrades their quality with respect to the common FID metric, these images are surprisingly beneficial for model training. We conduct a systematic empirical analysis of the effects of both augmentations and their key hyperparameters on the performance of image classifiers. Our results demonstrate that stylization and synthetic data complement each other well and can be combined with popular rule-based data augmentation techniques such as TrivialAugment, while not working with others. Our method achieves state-of-the-art corruption robustness on several small-scale image classification benchmarks, reaching 93.54%, 74.9% and 50.86% robust accuracy on CIFAR-10-C, CIFAR-100-C and TinyImageNet-C, respectively

Paper Structure

This paper contains 20 sections, 6 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Our proposed data augmentation pipeline primarily comprises synthetic data and style transfer (yellow). $\lambda$, $\lambda_{o}$ and $\lambda_{s}$ are hyperparameters for the synthetic data ratio and the probabilities of applying stylization on original and synthetic data. The pipeline includes various additional rule-based image augmentations.
  • Figure 2: Test accuracy, robustness and FID of stylized synthetic images plotted over varying synthetic stylization ratios $\lambda_s$. The accuracy and robustness maxima are not aligned with the minimum FID of the synthetic training data. Hence, FID is not a good predictor for the usefulness of synthetic images for data augmentation.
  • Figure 3: Validation accuracy (top) and robustness (bottom) on TinyImageNet with variations to $\alpha_o$ (left, with $\lambda=0$, $\lambda_o=0.5$) and $\alpha_s$ (right, with $\lambda=0.7, \lambda_s=0.5, \lambda_o=0$). Green horizontal lines illustrate results where $\alpha$ is randomly drawn from the interval indicated by the horizontal extension of the line. The black baseline displays the result with no stylization. $\alpha_o=1.0$ is clearly best for original data, while $\alpha_s \sim [0.1,1.0]$ is a balanced choice for synthetic data.
  • Figure 4: Applying both TrivialAugment and Stylization (TA and NST) vs. either one of them (TA or NST) on C100. Validation accuracy (top) and robustness (bottom) are reported over various stylization probabilities. TA and NST outperforms on original data (left), TA or NST is better on synthetic data (right).
  • Figure 5: Robustness and accuracy maps depending on the ratio of stylization on original ($\lambda_o$, x-axis) and synthetic ($\lambda_s$, y-axis) images. Values are averaged across 5 runs, synthetic image ratio $\lambda$ is 0.5. The points mark the lambda combination that is optimal for that dataset with respect to the mean of accuracy and robustness. On CIFAR in general, but also for optimal robustness on TIN, $\lambda_o<\lambda_s$.
  • ...and 2 more figures