Table of Contents
Fetching ...

Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions

Thorsten Glüsenkamp

TL;DR

This work presents a unified variational-inference framework that recasts both supervised learning and variational autoencoders as instances of minimising the joint KL-divergence between the true data-label distribution and a tractable model, enabled by conditional normalizing flows with a Gaussian base. By deriving extended supervised losses and a semi-supervised extension, the authors enable per-event uncertainty, coverage testing, effective marginalization of systematic uncertainties, and posterior-predictive goodness-of-fit within a single model. The approach leverages flow-based posteriors on product spaces (e.g., $\,\mathbb{R}^n \times \mathcal{S}^m$) to handle complex, multi-modal distributions common in astro-particle reconstructions, including directional data on spheres. Demonstrations on toy IceCube-like data show how base-ordered coverage can be computed analytically and how systematic uncertainties influence posterior widths, ultimately enabling fast, uncertainty-guaranteed decision-making for event selections and alerts.

Abstract

Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours that are unique to normalizing flows. Furthermore we show how systematic uncertainties can be included via effective marginalization during training. The proposed extended supervised training incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are in principle no constraints on the shape of the involved distributions, in fact the machinery works with complex multi-modal distributions defined on product spaces like $\mathbb{R}^n \times \mathbb{S}^m$. The coverage calculation, however, requires care in its interpretation when the distributions are too degenerate. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.

Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions

TL;DR

This work presents a unified variational-inference framework that recasts both supervised learning and variational autoencoders as instances of minimising the joint KL-divergence between the true data-label distribution and a tractable model, enabled by conditional normalizing flows with a Gaussian base. By deriving extended supervised losses and a semi-supervised extension, the authors enable per-event uncertainty, coverage testing, effective marginalization of systematic uncertainties, and posterior-predictive goodness-of-fit within a single model. The approach leverages flow-based posteriors on product spaces (e.g., ) to handle complex, multi-modal distributions common in astro-particle reconstructions, including directional data on spheres. Demonstrations on toy IceCube-like data show how base-ordered coverage can be computed analytically and how systematic uncertainties influence posterior widths, ultimately enabling fast, uncertainty-guaranteed decision-making for event selections and alerts.

Abstract

Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours that are unique to normalizing flows. Furthermore we show how systematic uncertainties can be included via effective marginalization during training. The proposed extended supervised training incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are in principle no constraints on the shape of the involved distributions, in fact the machinery works with complex multi-modal distributions defined on product spaces like . The coverage calculation, however, requires care in its interpretation when the distributions are too degenerate. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.

Paper Structure

This paper contains 25 sections, 36 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Variational inference (VI) examples for simulation data $x_i$ and labels $z_i$ indexed by $i=1 \ldots N$, which comprise the whole dataset of size $N$. The exception is the single event example in (a), which has a single datum $x_j$ as input.
  • Figure 2: Illustration of the two simplest toy Monte Carlo datasets. Black dots denote collection modules, squares with arrows indicate shower-like neutrino events and vertical bars expected logarithmic photon yield in a given photodetector. (a) Dataset 1 (single photodetector) with two example events A and B. (b) Photon arrival time distributions of events A and B. (c) Dataset 2 (16 photon collectors) with two example events C and D.
  • Figure 3: General conditional flow (a) and an affine conditional flow (b) parametrization of the approximate posterior in supervised learning. Choosing $\sigma_{\phi}=1$ yields a shifted standard normal distribution which is the PDF used in the MSE loss. General normalizing flow parameters are denoted by $\vec{F}$ and the parameters of the encoding neural network are denoted by $\phi$. The common encoding scheme for all experiments is depicted in (c), which amortizes the parameters $\vec{F}$.
  • Figure 4: A comparison of posteriors of the position for the example events A and B from dataset 1 and events C and D from dataset 2. The normalizing flow posterior is shown together with a $68 \%$ contained probability mass in black. The $68 \%$ probability mass contour of the true posterior assuming a flat prior is shown in white. The true event positions are marked in red. The upper row shows the result for Gaussianization flows, the lower row for an affine flow (a Gaussian) with a single covariance parameter.
  • Figure 5: Posterior approximation performance of a Gaussianization flow Meng2020 (GF), an affine flow with a single variable width $\sigma$ (affine), and an affine flow with $\sigma=1$ (MSE). Along the x-axis the number of parameters and the potential flow complexity increases, while the encoding complexity is held fixed (see text). The respective upper plot shows the validation loss. The black dotted line shows the loss obtained from the true posterior. The respective lower plot shows the sample-based KL-divergence.
  • ...and 8 more figures