A prediction rigidity formalism for low-cost uncertainties in trained neural networks

Filippo Bigi; Sanggyu Chong; Michele Ceriotti; Federico Grasselli

A prediction rigidity formalism for low-cost uncertainties in trained neural networks

Filippo Bigi, Sanggyu Chong, Michele Ceriotti, Federico Grasselli

TL;DR

This work proposes ‘prediction rigidities’ as a formalism to obtain uncertainties of arbitrary pre-trained regressors and a last-layer approximation is developed and rigorously justified to enable the application of the method to neural networks.

Abstract

Regression methods are fundamental for scientific and technological applications. However, fitted models can be highly unreliable outside of their training domain, and hence the quantification of their uncertainty is crucial in many of their applications. Based on the solution of a constrained optimization problem, we propose "prediction rigidities" as a method to obtain uncertainties of arbitrary pre-trained regressors. We establish a strong connection between our framework and Bayesian inference, and we develop a last-layer approximation that allows the new method to be applied to neural networks. This extension affords cheap uncertainties without any modification to the neural network itself or its training procedure. We show the effectiveness of our method on a wide range of regression tasks, ranging from simple toy models to applications in chemistry and meteorology.

A prediction rigidity formalism for low-cost uncertainties in trained neural networks

TL;DR

Abstract

Paper Structure (35 sections, 48 equations, 6 figures, 2 tables)

This paper contains 35 sections, 48 equations, 6 figures, 2 tables.

Introduction
Background
Existing uncertainty quantification schemes
Related work
Theory
Problem statement and notation
Prediction rigidities as the solution of a constrained optimization problem
Prediction rigidities and Bayesian inference
An efficient approximation for the Hessian
Application to neural networks
Uncertainty propagation
Results
A simple 1D example
Probabilistic backpropagation benchmark
Chemistry applications
...and 20 more sections

Figures (6)

Figure 1: Uncertainties predicted as the inverse of the prediction rigidity for polynomial fit, Gaussian fit, and neural network fit (last-layer approximation), respectively. In all three cases, Training set points are marked in blue, model prediction is shown in orange, and the estimated uncertainties are shaded in light blue.
Figure 2: LLPR uncertainty estimates for a SOAP-BPNN model trained on the QM9 dataset. Left: parity plot of the estimated error vs absolute error on test samples. The thin black lines represent confidence intervals containing fractions of the probability distributions that are equal to those within one, two, and three standard deviations for a Gaussian distribution. Right: parity plot of the predicted variance vs mean squared error for the test samples, where each point is the average over a bin of 100 test set samples with similar estimated variances. More details on these plots can be found in \ref{['app:plot-details']}.
Figure 3: LLPR uncertainty predictions on the Australia weather dataset. Left: parity plot of the estimated error vs absolute error on test samples. The thin black lines represent confidence intervals containing fractions of the probability distributions that are equal to those within one, two, and three standard deviations for a Gaussian distribution. Right: parity plot of the predicted variance vs mean squared error for test samples. Each point represents the average of a bin of 200 test set samples with similar estimated variances. More details on the plots can be found in \ref{['app:plot-details']}.
Figure 4: In-domain and out-of-domain uncertainty predictions on the California housing dataset. Left: parity plot of the estimated error vs absolute error on test samples. The thin black lines represent confidence intervals containing fractions of the probability distributions that are equal to those within one, two, and three standard deviations for a Gaussian distribution. Right: parity plot of the predicted variance vs mean squared error for test samples. Each point represents the average of a bin of 100 test set samples with similar predicted variance. More details on the plots can be found in \ref{['app:plot-details']}.
Figure 5: Quality of the LLPR uncertainty estimates as a function of the number of neurons per layer. Each point corresponds to the average of 100 test samples.
...and 1 more figures

A prediction rigidity formalism for low-cost uncertainties in trained neural networks

TL;DR

Abstract

A prediction rigidity formalism for low-cost uncertainties in trained neural networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)