Posterior Uncertainty Quantification in Neural Networks using Data Augmentation

Luhuan Wu; Sinead Williamson

Posterior Uncertainty Quantification in Neural Networks using Data Augmentation

Luhuan Wu, Sinead Williamson

TL;DR

Posterior Uncertainty Quantification in Neural Networks using Data Augmentation reframes predictive uncertainty as uncertainty about future data via martingale posteriors and shows that deep ensembles implement a mis-specified posterior. It introduces MixupMP, a Mixup-based martingale posterior that integrates augmented data as a base measure, producing samples from $\mathbb{P}^{(\text{MMP})}_\infty$ and enabling ensemble-like posterior draws. Across image classification benchmarks, MixupMP achieves superior predictive performance and better uncertainty calibration, particularly under distribution shift, compared to Bayesian and non-Bayesian baselines. The work provides a scalable, data-driven approach to uncertainty quantification in high-dimensional structured data by leveraging principled martingale-posteriors and domain-aware data augmentation.

Abstract

In this paper, we approach the problem of uncertainty quantification in deep learning through a predictive framework, which captures uncertainty in model parameters by specifying our assumptions about the predictive distribution of unseen future data. Under this view, we show that deep ensembling (Lakshminarayanan et al., 2017) is a fundamentally mis-specified model class, since it assumes that future data are supported on existing observations only -- a situation rarely encountered in practice. To address this limitation, we propose MixupMP, a method that constructs a more realistic predictive distribution using popular data augmentation techniques. MixupMP operates as a drop-in replacement for deep ensembles, where each ensemble member is trained on a random simulation from this predictive distribution. Grounded in the recently-proposed framework of Martingale posteriors (Fong et al., 2023), MixupMP returns samples from an implicitly defined Bayesian posterior. Our empirical analysis showcases that MixupMP achieves superior predictive performance and uncertainty quantification on various image classification datasets, when compared with existing Bayesian and non-Bayesian approaches.

Posterior Uncertainty Quantification in Neural Networks using Data Augmentation

TL;DR

and enabling ensemble-like posterior draws. Across image classification benchmarks, MixupMP achieves superior predictive performance and better uncertainty calibration, particularly under distribution shift, compared to Bayesian and non-Bayesian baselines. The work provides a scalable, data-driven approach to uncertainty quantification in high-dimensional structured data by leveraging principled martingale-posteriors and domain-aware data augmentation.

Abstract

Paper Structure (41 sections, 3 theorems, 13 equations, 8 figures, 5 tables, 2 algorithms)

This paper contains 41 sections, 3 theorems, 13 equations, 8 figures, 5 tables, 2 algorithms.

INTRODUCTION
Background
Setup and notation.
Martingale posterior distributions
Bayesian bootstrap.
Ensemble methods in deep learning
Deep ensembles.
Monte Carlo dropout.
Mixup
An equivalency between DE and BB
Mis-specification of BB and DE.
Mixup Martingale posteriors: Incorporating prior knowledge about the distribution of interest
Relationship to other methods.
Illustration of the predictive distribution.
Approximate MixupMP
...and 26 more sections

Key Result

Proposition 1

(Informal) If a dataset $z_{1:n}$ is separable under a given homogeneous neural network with parameters $\theta$, trained via stochastic gradient descent using an exponentially tailed loss (e.g. cross-entropy) with weak regularization, then any posterior sample of $\theta$ obtained via an appropriat

Figures (8)

Figure 1: Illustration of MixupMP on synthetic classification task $(K=5)$ with $\alpha=1.0$. As $r$ increases, $F^{(\text{\sc{MMP}})}_\infty$ puts more uncertainty on the space between observations, inducing higher predictive uncertainty.
Figure 2: Impact of $\alpha$ and $r$ on test set performance of MixupMP on CIFAR10. $r=0$ corresponds to DE; $r=\infty$ corresponds to Mixup Ensemble. Results for CIFAR100 and FMNIST are included in \ref{['app:extra_ourmethod_results']}.
Figure 3: Performance under distribution shift using CIFAR10-C dataset. The distribution shift intensity ranges from 0 to 5, where 0 indicates no shift.
Figure A.1: Illustration of Martingale posteriors. Different specifications of the future predictive distribution lead to different uncertainty quantification behaviors.
Figure B.1: Illustration of MixupMP on synthetic classification task $(K=5)$ with $r=1.0$ and varying $\alpha$. As $\alpha$ increases, $F^{(\text{\sc{MMP}})}_\infty$ samples are more concentrated on the middle of different data pairs, inducing wider predictive uncertainty bands around the decision boundary.
...and 3 more figures

Theorems & Definitions (3)

Proposition 1
Lemma 1: Proposition 3 of xu2021understanding
Proposition 1

Posterior Uncertainty Quantification in Neural Networks using Data Augmentation

TL;DR

Abstract

Posterior Uncertainty Quantification in Neural Networks using Data Augmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)