On Uncertainty Quantification for Near-Bayes Optimal Algorithms

Ziyu Wang; Chris Holmes

On Uncertainty Quantification for Near-Bayes Optimal Algorithms

Ziyu Wang, Chris Holmes

TL;DR

It is proved that it is possible to recover the Bayesian posterior defined by the task distribution, which is unknown but optimal in this setting, by building a martingale posterior using the algorithm.

Abstract

Bayesian modelling allows for the quantification of predictive uncertainty which is crucial in safety-critical applications. Yet for many machine learning (ML) algorithms, it is difficult to construct or implement their Bayesian counterpart. In this work we present a promising approach to address this challenge, based on the hypothesis that commonly used ML algorithms are efficient across a wide variety of tasks and may thus be near Bayes-optimal w.r.t. an unknown task distribution. We prove that it is possible to recover the Bayesian posterior defined by the task distribution, which is unknown but optimal in this setting, by building a martingale posterior using the algorithm. We further propose a practical uncertainty quantification method that apply to general ML algorithms. Experiments based on a variety of non-NN and NN algorithms demonstrate the efficacy of our method.

On Uncertainty Quantification for Near-Bayes Optimal Algorithms

TL;DR

Abstract

Paper Structure (61 sections, 2 theorems, 51 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 61 sections, 2 theorems, 51 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Background
Notations.
Bayesian modelling.
Martingale posteriors.
Martingales for machine learning?
Martingale Posteriors with Near-Optimal Algorithms
Setup and Main Result
Analysis setup.
Main result.
Examples
Exponential Family Models and Sequential MLE
Regularised Algorithms in High Dimensions
A linear-Gaussian inverse problem.
Connections to GP regression.
...and 46 more sections

Key Result

Theorem 3.1

Let $\pi_{n}, \hat{p}_{mp,n}$ be defined as above, and $W_{2,\theta}$ be the 2-Wasserstein distance w.r.t. $\|\cdot\|$. Under Asm. asm:approx-martingale-asm:conventions, there exists some $C>0$ determined by $(C_\Theta,C_{\mathcal{A}},C_{\mathcal{A}}',L_1,L_2)$ s.t. for $\chi_n = C/(sn^s) \to 0$ we Consequently, if $N\gg n$ is sufficiently large so that $\bar{\varepsilon}_{B,N}\ll \bar{\varepsilo

Figures (4)

Figure 1: GP inference on the Snelson dataset: visualisation of the approximate MP defined by Eq. \ref{['eq:gp-alg-spo']}, compared with the ensemble predictors defined by a modified MAP estimator with similar initialisation randomness (Eq. \ref{['eq:map-anchoring']}). Solid line and shade indicate the mean estimate and $80\%$ pointwise credible intervals (CIs) for the true regression function. Dashed line indicates the $80\%$ CIs from the exact posterior. Dots at bottom indicate the location of training inputs.
Figure 2: Multi-task learning simulation: results with varying choices of $(m,n_{pret},n_{test})$. Plotted are the mean and 95% confidence interval (CI) for each metric. CIs are computed on 160 replications using normal approximation (first two subplots) or the Wilson score (last subplot).
Figure 3: Classification experiment: scatter plot of the test metrics (for each dataset averaged over 10 random splits; higher is better) for the base algorithm vs the proposed method.
Figure 4: Classification experiment: approximate MP for the GDBT feature importance scores and their pairwise correlations. Plotted are the top 5 features in the UCI adult dataset.

Theorems & Definitions (13)

Remark 2.1: supervised learning
Remark 2.2: identifiability and semi-norm
Theorem 3.1: proof in App. \ref{['app:proof-thm-main']}
Remark 3.1
Example A.1: comparison to nonparametric bootstrap
Example A.2: comparison to parametric bootstrap
Corollary B.1
proof
Claim B.1
proof
...and 3 more

On Uncertainty Quantification for Near-Bayes Optimal Algorithms

TL;DR

Abstract

On Uncertainty Quantification for Near-Bayes Optimal Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (13)