Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

Niall Jeffrey; Benjamin D. Wandelt

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

Niall Jeffrey, Benjamin D. Wandelt

TL;DR

Evidence Networks tackle Bayesian model comparison when traditional methods falter due to intractable integrals, unknown likelihoods, or missing parameterizations. By designing bespoke losses, the authors train neural estimators that directly yield useful functions of the Bayes factor, with the l-POP-Exponential loss as the default for robust $\log K$ estimates. They validate the approach on analytic time-series problems and Dark Energy Survey data, showing accuracy across high-dimensional spaces and resilience in likelihood-free settings, while outperforming density-estimation baselines and remaining competitive with nested sampling when likelihoods are known. The method enables fast, amortized, parameter-free model comparison and broadens the scope of simulation-based inference, posterior predictive checks, and potential frequentist calibration via Bayes factors.

Abstract

Evidence Networks can enable Bayesian model comparison when state-of-the-art methods (e.g. nested sampling) fail and even when likelihoods or priors are intractable or unknown. Bayesian model comparison, i.e. the computation of Bayes factors or evidence ratios, can be cast as an optimization problem. Though the Bayesian interpretation of optimal classification is well-known, here we change perspective and present classes of loss functions that result in fast, amortized neural estimators that directly estimate convenient functions of the Bayes factor. This mitigates numerical inaccuracies associated with estimating individual model probabilities. We introduce the leaky parity-odd power (l-POP) transform, leading to the novel ``l-POP-Exponential'' loss function. We explore neural density estimation for data probability in different models, showing it to be less accurate and scalable than Evidence Networks. Multiple real-world and synthetic examples illustrate that Evidence Networks are explicitly independent of dimensionality of the parameter space and scale mildly with the complexity of the posterior probability density function. This simple yet powerful approach has broad implications for model inference tasks. As an application of Evidence Networks to real-world data we compute the Bayes factor for two models with gravitational lensing data of the Dark Energy Survey. We briefly discuss applications of our methods to other, related problems of model comparison and evaluation in implicit inference settings.

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

TL;DR

estimates. They validate the approach on analytic time-series problems and Dark Energy Survey data, showing accuracy across high-dimensional spaces and resilience in likelihood-free settings, while outperforming density-estimation baselines and remaining competitive with nested sampling when likelihoods are known. The method enables fast, amortized, parameter-free model comparison and broadens the scope of simulation-based inference, posterior predictive checks, and potential frequentist calibration via Bayes factors.

Abstract

Paper Structure (35 sections, 51 equations, 8 figures, 1 table)

This paper contains 35 sections, 51 equations, 8 figures, 1 table.

Introduction
Background
Paper summary
Motivation
Evidence Networks & comparison with existing methods
Motivation (i) - intractable integral:
Motivation (ii) - unknown likelihood/prior:
Motivation (iii) - no parameterization:
Optimization objective
Direct estimate of model posterior probability
Squared & Polynomial Loss:
Cross-entropy loss:
Direct estimate of Bayes factors
Exponential Loss:
Logistic loss:
...and 20 more sections

Figures (8)

Figure 1: Example 100-parameter time series model showing the underlying true signal overlaid with the observed data. The generative model is defined such that the Bayes factor can be calculated analytically from a closed form expression -- this is to evaluate the Evidence Network output against the ground truth for this demonstration. For this realization, $\log K > 0$, so $\theta_0=0$ is (slightly) disfavoured.
Figure 2: Left panel: Estimated log Bayes factor $K$ using an ensemble Evidence Network (with four networks) compared to analytic calculation using a closed-form expression for $K$. These data are time series with $100$ data elements draw from a generative model with $100$ model parameters. This result uses our default network architecture with $10^6$ training samples. As we show in section \ref{['sec:timeseries']}, all standard methods to compute the Bayes factor either fail or incur significant error for this high-dimensional example. Right panel: Evidence Network model posterior probabilities derived from the estimated Bayes factors (equation \ref{['eq:rearrange']}) compared with the relative model fraction in the validation data. This allows validating the Evidence Network output when ground-truth model evidence is unavailable.
Figure 3: Three choices of loss to estimate the log Bayes factor $K$. Each panel shows results from the same network architecture; we use only a single network, rather than an ensemble as in Fig. \ref{['fig:100_gaussian']}, as this is sufficient to show the different systematic errors with increasing $K$. The left panel uses the Cross Entropy Loss to estimate individual model posteriors from which the Bayes factor is estimated. The centre panel uses the Exponential Loss to estimate $\log K$. The right panel uses the l-POP-Exponential Loss ($\alpha=2$) to estimate $\log K$.
Figure 4: Left panel: The number of nested sampling $p(x|\theta)$ evaluations is controlled by the number of live points in the PolyChord algorithm, which used the code default value (with a maximum cut-off of $10^8$). The number of $p(x|\theta)$ samples used by the Evidence Network (i.e. $x$ samples for training) was chosen to match the polynomial scaling of PolyChord (solid lines), but fixed at approximately $\sim 1$ per cent of the number of PolyChord evaluations of $p(x|\theta)$. Right panel: Despite significantly fewer $p(x|\theta)$ samples than nested sampling $p(x|\theta)$ evaluations, we consistently recover more accurate estimates of $\log K$ with the Evidence Network. The Evidence Network error-bars are the standard deviation of the rmse result from 5 runs of the Evidence Network ensemble (each with four networks); the central value is the mean. For the PolyChord points, it is infeasible to estimate an error-bar given the computational time required.
Figure 5: Left panel: Estimated $\log_{10} K$ for 1000 data samples using (i) the Normalizing Flow ratio for estimated $p(x|M_1)$ and $p(x|M_0)$, and (ii) the direct estimate using the Evidence Network. The set-up for this prediction using the 20 dimensional time series data is described in section \ref{['sec:density_method']}. Right panel: The distribution of residual errors for each method. The RMSE (error) using the Normalizing Flow is more than a factor of 10 larger than the direct estimation of $\log_{10} K$ using the Evidence Network for this problem.
...and 3 more figures

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

TL;DR

Abstract

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

Authors

TL;DR

Abstract

Table of Contents

Figures (8)