Calibrating Bayesian Generative Machine Learning for Bayesiamplification

Sebastian Bieringer; Sascha Diefenbacher; Gregor Kasieczka; Mathias Trabs

Calibrating Bayesian Generative Machine Learning for Bayesiamplification

Sebastian Bieringer, Sascha Diefenbacher, Gregor Kasieczka, Mathias Trabs

TL;DR

This work shows a clear scheme for quantifying the calibration of Bayesian generative machine learning models and evaluates the calibration of Bayesian uncertainties from either a mean-field Gaussian weight posterior, or Monte Carlo sampling network weights, to gauge their behaviour on unsteady distribution edges.

Abstract

Recently, combinations of generative and Bayesian machine learning have been introduced in particle physics for both fast detector simulation and inference tasks. These neural networks aim to quantify the uncertainty on the generated distribution originating from limited training statistics. The interpretation of a distribution-wide uncertainty however remains ill-defined. We show a clear scheme for quantifying the calibration of Bayesian generative machine learning models. For a Continuous Normalizing Flow applied to a low-dimensional toy example, we evaluate the calibration of Bayesian uncertainties from either a mean-field Gaussian weight posterior, or Monte Carlo sampling network weights, to gauge their behaviour on unsteady distribution edges. Well calibrated uncertainties can then be used to roughly estimate the number of uncorrelated truth samples that are equivalent to the generated sample and clearly indicate data amplification for smooth features of the distribution.

Calibrating Bayesian Generative Machine Learning for Bayesiamplification

TL;DR

Abstract

Paper Structure (16 sections, 29 equations, 6 figures)

This paper contains 16 sections, 29 equations, 6 figures.

Introduction
Bayesian Neural Networks
Bayesian Continuous Normalizing Flows
Variational Inference Bayes
Markov Chain Monte Carlo
Toy Setup
Gamma Function Ring
Hyperparameter Choices
Quantiles
Calibration
Scaling with the Number of Quantiles
Calibration at Sharp Features in Radial Direction
Bayesiamplification
Checking Amplification with Jensen-Shannon Divergence
Conclusion
...and 1 more sections

Figures (6)

Figure 1: Left: Histogram of one training data set ($10,000$ points). The data follows a ring structure with a sharp edge at $r=4$ and a long tail to higher radii. Mid: Marginal distribution of the training data in radial direction. Right: $5\times5$ quantiles generated from a data set of $10M$ points and filled with the training data. The quantiles are constructed with equal probability of the truth data to fall into every quantiles.
Figure 2: Mean empirical coverage for confidence intervals calculated from $10$ samples of the Bayesian weight posterior drawn with AdamMCMC using 4 different hyperparameter settings. Higher $\sigma$ will generally result in larger uncertainties. The empirical coverage is calculated from $5$ independent runs and averaged over all quantiles. The panels show a clear dependence of the calibration on the number of quantiles increasing from left to right.
Figure 3: Mean empirical coverage for confidence intervals calculated from $50$ drawings from the VIB approximation of the Bayesian weight posterior with 4 different hyperparameter settings. Larger $k$ increases the dependence of the fit on the prior. The empirical coverage is calculated from $5$ independent runs and averaged over all quantiles. The panels again show a clear dependence of the calibration on the number of quantiles increasing from left to right.
Figure 4: Left: Mean absolute deviation between the nominal and the empirical coverage ($5$ runs) for $50$ posterior samples from both, VIB at $k=10$ and AdamMCMC at $\sigma=0.1$. The panel shows a strong dependence on the number of quantiles. From evaluating the calibration plots for all numbers of quantiles individually, we know both methods are undercertain at low numbers. With increasing $n_Q$, the calibration mean exhibits a strong dependence on the order of the absolute and average operations. When only the over- and underdensities along radial or angular dimension can cancel out, the mean calibration of the VIB-CNFs drastically decreases. Right: Difference between nominal and empirical coverage (mean over angular direction) for the radial direction of $200\time200$ quantiles. Values below $0$ indicate overcertainty, while values above indicate undercertain predictions. While the predictions of the AdamMCMC-CNFs are well calibrated to slightly undercertain along the radius, the prediction of the VIB-CNFs starts out overcertain due to the oversmoothing caused by the strong prior dependence. When averaging over all radii, the VIB-CNF predictions cancel exactly and the BNN seems well calibrated.
Figure 5: Left: Amplification estimate generated by equating the error prediction per bin for both BNNs to the Poisson error of an independent data set. Higher uncertainty results in lower amplification. Errorbars are calculated from the ensemble of $5$ runs done per BNN method. Again, we use $50$ samples of the weight posterior (approximation) for both methods. The faint solid lines show the result of exponential linear fit (least squares) to the last $8$ points. Right: The mean estimate over all quantiles converges to a constant value at high numbers of quantiles resulting in linear scaling of the amplification with the number of quantiles.
...and 1 more figures

Calibrating Bayesian Generative Machine Learning for Bayesiamplification

TL;DR

Abstract

Calibrating Bayesian Generative Machine Learning for Bayesiamplification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)