Table of Contents
Fetching ...

High-Dimensional Bayesian Model Comparison in Cosmology with GPU-accelerated Nested Sampling and Neural Emulators

Toby Lovick, David Yallup, Davide Piras, Alessio Spurio Mancini, Will Handley

TL;DR

This work tackles the computational challenge of Bayesian model comparison in high-dimensional cosmology by deploying a GPU-accelerated Nested Sampling framework that leverages JAX-based neural emulators to compute likelihoods and the evidence $\\mathcal{Z}$. It demonstrates substantial wall-clock speed-ups over CPU-based approaches and achieves reliable evidence calculations for both a 6D CMB problem and a 37/39D cosmic shear analysis, including a 39D CPL vs $\Lambda$CDM comparison, on a single A100 GPU. By contrasting NS with gradient-based MCMC approaches using a learned harmonic mean estimator for $\\mathcal{Z}$, the paper shows that vectorisation and emulator-based likelihoods render NS competitive in speed for vectorisable problems and drastically reduce runtimes for challenging high-dimensional analyses. The results imply that GPU-accelerated NS, paired with differentiable emulators, enables robust model selection and broader exploration of cosmological models for current and upcoming surveys, with clear paths toward multi-GPU scaling and further emulator enhancements.

Abstract

We demonstrate a GPU-accelerated nested sampling framework for efficient high-dimensional Bayesian inference in cosmology. Using JAX-based neural emulators and likelihoods for cosmic microwave background and cosmic shear analyses, our approach provides parameter constraints and direct calculation of Bayesian evidence. In the 39-dimensional $Λ$CDM vs $w_0w_a$ shear analysis, we produce Bayes factors and a robust error bar in just 2 days on a single A100 GPU, without loss of accuracy. Where CPU-based nested sampling can now be outpaced by methods relying on MCMC sampling and decoupled evidence estimation, we demonstrate that with GPU acceleration nested sampling offers the necessary speed-up to put it on equal computational footing with these methods, especially where reliable model comparison is paramount. We also explore interpolation in the matter power spectrum for cosmic shear analysis, finding a further factor of 4 speed-up with consistent posterior contours and Bayes factor. We put forward both nested and gradient-based sampling as useful tools for the modern cosmologist, where cutting-edge inference pipelines can yield orders of magnitude improvements in computation time.

High-Dimensional Bayesian Model Comparison in Cosmology with GPU-accelerated Nested Sampling and Neural Emulators

TL;DR

This work tackles the computational challenge of Bayesian model comparison in high-dimensional cosmology by deploying a GPU-accelerated Nested Sampling framework that leverages JAX-based neural emulators to compute likelihoods and the evidence . It demonstrates substantial wall-clock speed-ups over CPU-based approaches and achieves reliable evidence calculations for both a 6D CMB problem and a 37/39D cosmic shear analysis, including a 39D CPL vs CDM comparison, on a single A100 GPU. By contrasting NS with gradient-based MCMC approaches using a learned harmonic mean estimator for , the paper shows that vectorisation and emulator-based likelihoods render NS competitive in speed for vectorisable problems and drastically reduce runtimes for challenging high-dimensional analyses. The results imply that GPU-accelerated NS, paired with differentiable emulators, enables robust model selection and broader exploration of cosmological models for current and upcoming surveys, with clear paths toward multi-GPU scaling and further emulator enhancements.

Abstract

We demonstrate a GPU-accelerated nested sampling framework for efficient high-dimensional Bayesian inference in cosmology. Using JAX-based neural emulators and likelihoods for cosmic microwave background and cosmic shear analyses, our approach provides parameter constraints and direct calculation of Bayesian evidence. In the 39-dimensional CDM vs shear analysis, we produce Bayes factors and a robust error bar in just 2 days on a single A100 GPU, without loss of accuracy. Where CPU-based nested sampling can now be outpaced by methods relying on MCMC sampling and decoupled evidence estimation, we demonstrate that with GPU acceleration nested sampling offers the necessary speed-up to put it on equal computational footing with these methods, especially where reliable model comparison is paramount. We also explore interpolation in the matter power spectrum for cosmic shear analysis, finding a further factor of 4 speed-up with consistent posterior contours and Bayes factor. We put forward both nested and gradient-based sampling as useful tools for the modern cosmologist, where cutting-edge inference pipelines can yield orders of magnitude improvements in computation time.

Paper Structure

This paper contains 17 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Execution time for batched likelihood evaluations for both the CMB and cosmic shear likelihoods. Here the y-axis refers to the time relative to calling the likelihood on a single set of parameters. The interpolated likelihood involved evaluating the Matter Power Spectrum at fewer $z$-values, explained in detail in Appendix \ref{['appendix_interp']}.
  • Figure 2: Marginalised posterior distributions for the cosmological parameters of the cosmic shear analysis, using the full likelihoods with GPU-NS. The contours show the 68% and 95% credible intervals for each parameter. The dashed lines show the fiducial values that the mock data was generated with.
  • Figure 3: A comparison of the NS posterior contours of the $\Lambda$CDM likelihood with and without interpolation over the matter power spectrum. The posteriors are in excellent agreement, although the likelihood values of each model differ by a systematic $\Delta \log \mathcal{L} \approx 2$.
  • Figure 4: Full posterior constraints for the 39 parameters of the $\Lambda$CDM (blue) and $w_0w_a$(orange) cosmologies from cosmic shear , obtained with GPU-NS. The dashed lines show the truth values of the parameters which were used to generate the data.