Table of Contents
Fetching ...

Optimizing Data Augmentation through Bayesian Model Selection

Madi Matymov, Ba-Hien Tran, Michael Kampffmeyer, Markus Heinonen, Maurizio Filippone

TL;DR

This work tackles the challenge of selecting effective data augmentation (DA) strategies by reframing augmentation as a Bayesian model selection problem. It introduces OPTIMA, a framework that treats augmentation parameters as latent hyper-parameters and derives an augmented evidence lower bound (ELBO) to jointly optimize model and augmentation parameters, thereby avoiding costly cross-validation. Theoretical contributions include a Jensen-gap bound for the variational augmentation, PAC-Bayes generalization guarantees, and an analysis of higher-order invariance that regularizes the model’s sensitivity to transformations; an empirical Bayes perspective links optimization to data-driven augmentation selection. Empirically, OPTIMA improves calibration, generalization, and robustness on CIFAR-10 and ImageNet/Imagenet-C benchmarks, often outperforming fixed or naive augmentation schemes while reducing calibration error and improving uncertainty estimates. Overall, the framework provides a principled, scalable approach to learning augmentation policies within a Bayesian paradigm, with strong potential to enhance reliability in safety-critical applications.

Abstract

Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task which is traditionally left to trial-and-error or expensive optimization based on validation performance. In this paper, we counter these limitations by proposing a novel framework for optimizing DA. In particular, we take a probabilistic view of DA, which leads to the interpretation of augmentation parameters as model (hyper)-parameters, and the optimization of the marginal likelihood with respect to these parameters as a Bayesian model selection problem. Due to its intractability, we derive a tractable Evidence Lower BOund (ELBO), which allows us to optimize augmentation parameters jointly with model parameters. We provide extensive theoretical results on variational approximation quality, generalization guarantees, invariance properties, and connections to empirical Bayes. Through experiments on computer vision tasks, we show that our approach improves calibration and yields robust performance over fixed or no augmentation. Our work provides a rigorous foundation for optimizing DA through Bayesian principles with significant potential for robust machine learning.

Optimizing Data Augmentation through Bayesian Model Selection

TL;DR

This work tackles the challenge of selecting effective data augmentation (DA) strategies by reframing augmentation as a Bayesian model selection problem. It introduces OPTIMA, a framework that treats augmentation parameters as latent hyper-parameters and derives an augmented evidence lower bound (ELBO) to jointly optimize model and augmentation parameters, thereby avoiding costly cross-validation. Theoretical contributions include a Jensen-gap bound for the variational augmentation, PAC-Bayes generalization guarantees, and an analysis of higher-order invariance that regularizes the model’s sensitivity to transformations; an empirical Bayes perspective links optimization to data-driven augmentation selection. Empirically, OPTIMA improves calibration, generalization, and robustness on CIFAR-10 and ImageNet/Imagenet-C benchmarks, often outperforming fixed or naive augmentation schemes while reducing calibration error and improving uncertainty estimates. Overall, the framework provides a principled, scalable approach to learning augmentation policies within a Bayesian paradigm, with strong potential to enhance reliability in safety-critical applications.

Abstract

Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task which is traditionally left to trial-and-error or expensive optimization based on validation performance. In this paper, we counter these limitations by proposing a novel framework for optimizing DA. In particular, we take a probabilistic view of DA, which leads to the interpretation of augmentation parameters as model (hyper)-parameters, and the optimization of the marginal likelihood with respect to these parameters as a Bayesian model selection problem. Due to its intractability, we derive a tractable Evidence Lower BOund (ELBO), which allows us to optimize augmentation parameters jointly with model parameters. We provide extensive theoretical results on variational approximation quality, generalization guarantees, invariance properties, and connections to empirical Bayes. Through experiments on computer vision tasks, we show that our approach improves calibration and yields robust performance over fixed or no augmentation. Our work provides a rigorous foundation for optimizing DA through Bayesian principles with significant potential for robust machine learning.

Paper Structure

This paper contains 79 sections, 18 theorems, 58 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Proposition 4.1

The augmentation distribution variance and model sensitivity control the Jensen gap introduced by our lower bound approximation. If $f({{\boldsymbol{\mathbf{\gamma}}}}) = \log p({{\boldsymbol{\mathbf{y}}}} \mid T_{{\boldsymbol{\mathbf{\gamma}}}}({{\boldsymbol{\mathbf{x}}}}),{{\boldsymbol{\mathbf{\th Also, this bound is tight when $f({{\boldsymbol{\mathbf{\gamma}}}})$ is approximately linear in the

Figures (3)

  • Figure 1: OPTIMA obtains the best calibration. Example of ResNet-18 on cifar10.
  • Figure 2: (Left Two) Convergences of training and test accuracy on cifar10. OPTIMA obtains the highest accuracy. (Right Two) Evolutions of the data augmentation parameters.
  • Figure 3: Synthetic regression: (Left) Predictions on test data vs. the ground-truth function. (Right Three) Traces of the training loss, test loss, and the evolution of $\sigma$ for OPTIMA; the green dashed line indicates the fixed $\sigma = 0.1$ used in Fixed Aug.

Theorems & Definitions (27)

  • Proposition 4.1: Jensen Gap Bound
  • Corollary 4.2: Optimal Augmentation Variance
  • Definition 4.3: True and Empirical Risks
  • Theorem 4.4: PAC-Bayes with Augmented Likelihood
  • Theorem 4.5: Generalization Advantage of Bayesian-Optimized Augmentation
  • Corollary 4.6: Marginalization Advantage
  • Corollary 4.7: Augmentation-Aware Prior
  • Theorem 4.8: Higher-Order Invariance
  • Corollary 4.9: Input-Space Regularization
  • Corollary 4.10: Optimal Transformation Covariance
  • ...and 17 more