On the accuracy of posterior recovery with neural network emulators
H. T. J. Bevins, T. Gessey-Jones, W. J. Handley
TL;DR
This work provides a theoretically grounded bound on information loss when using neural-network emulators in Bayesian inference for cosmological models. By deriving and specializing a KL-divergence bound under Gaussian likelihood and (approximately) linear models, the authors quantify how emulator RMSE relative to data noise controls posterior distortion. They demonstrate the approach in a 21-cm cosmology setting by directly comparing ARES with a globalemu emulator, showing accurate posterior recovery even when RMSE is around 20% of the noise, and reconciling prior concerns about emulator use. The results offer a practical criterion for emulator accuracy and reinforce confidence in emulators as scalable tools for inference in computationally expensive cosmological simulations.
Abstract
Neural network emulators are widely used in astrophysics and cosmology to approximate complex simulations inside Bayesian inference loops. Ad hoc rules of thumb are often used to justify the emulator accuracy required for reliable posterior recovery. We provide a theoretically motivated limit on the maximum amount of incorrect information inferred by using an emulator with a given accuracy. Under assumptions of linearity in the model, uncorrelated noise in the data and a Gaussian likelihood function, we demonstrate that the difference between the true underlying posterior and the recovered posterior can be quantified via a Kullback-Leibler divergence. We demonstrate how this limit can be used in the field of 21-cm cosmology by comparing the posteriors recovered when fitting mock data sets generated with the 1D radiative transfer code ARES directly with the simulation code and separately with an emulator. This paper is partly in response to and builds upon recent discussions in the literature which call into question the use of emulators in Bayesian inference pipelines. Upon repeating some aspects of these analyses, we find these concerns quantitatively unjustified, with accurate posterior recovery possible even when the mean RMSE error for the emulator is approximately 20% of the magnitude of the noise in the data. For the purposes of community reproducibility, we make our analysis code public at this link https://github.com/htjb/validating_posteriors.
