Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions
Thorsten Glüsenkamp
TL;DR
This work presents a unified variational-inference framework that recasts both supervised learning and variational autoencoders as instances of minimising the joint KL-divergence between the true data-label distribution and a tractable model, enabled by conditional normalizing flows with a Gaussian base. By deriving extended supervised losses and a semi-supervised extension, the authors enable per-event uncertainty, coverage testing, effective marginalization of systematic uncertainties, and posterior-predictive goodness-of-fit within a single model. The approach leverages flow-based posteriors on product spaces (e.g., $\,\mathbb{R}^n \times \mathcal{S}^m$) to handle complex, multi-modal distributions common in astro-particle reconstructions, including directional data on spheres. Demonstrations on toy IceCube-like data show how base-ordered coverage can be computed analytically and how systematic uncertainties influence posterior widths, ultimately enabling fast, uncertainty-guaranteed decision-making for event selections and alerts.
Abstract
Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours that are unique to normalizing flows. Furthermore we show how systematic uncertainties can be included via effective marginalization during training. The proposed extended supervised training incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are in principle no constraints on the shape of the involved distributions, in fact the machinery works with complex multi-modal distributions defined on product spaces like $\mathbb{R}^n \times \mathbb{S}^m$. The coverage calculation, however, requires care in its interpretation when the distributions are too degenerate. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.
