Table of Contents
Fetching ...

OLÉ -- Online Learning Emulation in Cosmology

Sven Günther, Lennart Balkenhol, Christian Fidler, Ali Rida Khalife, Julien Lesgourgues, Markus R. Mosbech, Ravi Kumar Sharma

TL;DR

OLÉ tackles the cost of cosmological inference by introducing an online learning emulator that trains during inference using PCA for data compression and Gaussian Processes for fast predictions, with automatic accuracy checks to trigger retraining as needed. The approach yields speed-ups of up to $30-350\times$ while preserving accuracy relative to full Boltzmann-codes, and it supports differentiable likelihoods to gain an additional $\sim 4\times$ improvement via gradient-based sampling. OLÉ demonstrates its effectiveness across ΛCDM and extended cosmologies, including Stage-IV LSS forecasts and NEDE scenarios, interfacing smoothly with CLASS, CAMB, Cobaya, and MontePython. The combination of no pre-training, on-the-fly data acquisition, and robust uncertainty quantification enables scalable, energy-efficient cosmological analyses on complex data sets, with open-source availability at the OLÉ GitHub repository. The work further shows that differentiable pipelines combining OLÉ with candl can yield substantial gains in sampling efficiency, making high-precision cosmology more practical on large parameter spaces.

Abstract

In this work, we present OLÉ, a new online learning emulator for use in cosmological inference. The emulator relies on Gaussian Processes and Principal Component Analysis for efficient data compression and fast evaluation. Moreover, OLÉ features an automatic error estimation for optimal active sampling and online learning. All training data is computed on-the-fly, making the emulator applicable to any cosmological model or dataset. We illustrate the emulator's performance on an array of cosmological models and data sets, showing significant improvements in efficiency over similar emulators without degrading accuracy compared to standard theory codes. We find that OLÉ is able to considerably speed up the inference process, increasing the efficiency by a factor of $30-350$, including data acquisition and training. Typically the runtime of the likelihood code becomes the computational bottleneck. Furthermore, OLÉ emulators are differentiable; we demonstrate that, together with the differentiable likelihoods available in the $\texttt{candl}$ library, we can construct a gradient-based sampling method which yields an additional improvement factor of 4. OLÉ can be easily interfaced with the popular samplers $\texttt{MontePython}$ and $\texttt{Cobaya}$, and the Einstein-Boltzmann solvers $\texttt{CLASS}$ and $\texttt{CAMB}$. OLÉ is publicly available at https://github.com/svenguenther/OLE .

OLÉ -- Online Learning Emulation in Cosmology

TL;DR

OLÉ tackles the cost of cosmological inference by introducing an online learning emulator that trains during inference using PCA for data compression and Gaussian Processes for fast predictions, with automatic accuracy checks to trigger retraining as needed. The approach yields speed-ups of up to while preserving accuracy relative to full Boltzmann-codes, and it supports differentiable likelihoods to gain an additional improvement via gradient-based sampling. OLÉ demonstrates its effectiveness across ΛCDM and extended cosmologies, including Stage-IV LSS forecasts and NEDE scenarios, interfacing smoothly with CLASS, CAMB, Cobaya, and MontePython. The combination of no pre-training, on-the-fly data acquisition, and robust uncertainty quantification enables scalable, energy-efficient cosmological analyses on complex data sets, with open-source availability at the OLÉ GitHub repository. The work further shows that differentiable pipelines combining OLÉ with candl can yield substantial gains in sampling efficiency, making high-precision cosmology more practical on large parameter spaces.

Abstract

In this work, we present OLÉ, a new online learning emulator for use in cosmological inference. The emulator relies on Gaussian Processes and Principal Component Analysis for efficient data compression and fast evaluation. Moreover, OLÉ features an automatic error estimation for optimal active sampling and online learning. All training data is computed on-the-fly, making the emulator applicable to any cosmological model or dataset. We illustrate the emulator's performance on an array of cosmological models and data sets, showing significant improvements in efficiency over similar emulators without degrading accuracy compared to standard theory codes. We find that OLÉ is able to considerably speed up the inference process, increasing the efficiency by a factor of , including data acquisition and training. Typically the runtime of the likelihood code becomes the computational bottleneck. Furthermore, OLÉ emulators are differentiable; we demonstrate that, together with the differentiable likelihoods available in the library, we can construct a gradient-based sampling method which yields an additional improvement factor of 4. OLÉ can be easily interfaced with the popular samplers and , and the Einstein-Boltzmann solvers and . OLÉ is publicly available at https://github.com/svenguenther/OLE .

Paper Structure

This paper contains 32 sections, 13 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Evaluation of the log-likelihoods computed with predictions from CLASS ($\log l$) and OLÉ ($\log \tilde{l}$) in the example of Section \ref{['sec:lcdm']}. On the left they are displayed as function of the Hubble rate $h$. The over or undershooting of the estimated likelihood from using the OLÉ prediction is indicated by color. On the right the difference between the likelihoods are scaled by the error estimate of the emulator $\triangle_{\log \tilde{l}}$. The solid and dashed black lines indicate the one standard deviation region and mean, respectively. The likelihood values calculated using the emulator prediction are unbiased and the emulator's uncertainty estimation is accurate.
  • Figure 2: Fraction of elapsed computation time spent on each component of the process, spanning the first $\sim1.5$ hours of an MCMC analysis with OLÉ. It is clear that the initial phase is dominated by running the traditional theory code until enough points have been computed to train the emulator. Then comes a phase dominated by testing the emulator, as its accuracy must be verified. Finally, when confidence in the emulator is high, the runtime becomes dominated by the likelihood. As the run progresses, it will become even more likelihood-dominated.
  • Figure 3: Value of the reduced Hubble parameter $h$ for the first 15000 theory evaluations in one of the chains of a Cobaya MCMC run, as well as 6000 evaluations later in the chain for comparison. The points are colored according to the type of call to the standard theory code or emulator: CLASS calls saved in the cache and used to improve the emulator (black cross), CLASS calls deemed not relevant for the emulator and therefore not cached (gray dots), emulator calls followed by a successful accuracy test (green dots), emulator calls followed by a failed accuracy test (red dots), and emulator calls not followed by an accuracy test (blue dots). Gray dots appear only at the beginning during the burn-in phase. Note that red dots (failed accuracy tests) are immediately followed by black dots (CLASS calls added to the cache). As performance checks are passed and confidence in the emulator grows, testing is gradually decreased, such that green dots give way to blue dots. Note that points very far from the high-likelihood region would always be computed by the standard theory code to avoid an overly expensive and conservative training of the emulator, but this situation did not occur within the displayed sample.
  • Figure 4: Posteriors of the $\Lambda$CDM model parameters inferred from CLASS and Cobaya with Planck 2018, Pantheon+ and BAO data, as outlined in Section \ref{['sec:lcdm']}. We see an excellent match between the posteriors computed using either a standard MCMC or the OLÉ emulator, with a maximum deviation of posterior means by $0.03\sigma$.
  • Figure 5: Posteriors of the extended cosmology model on Planck and 2024 DESI BAO data as outlined in example \ref{['sec:extended']}. Posterior means obtained using OLÉ and CAMB differ by $D_x<0.048$, indicating good emulator accuracy.
  • ...and 4 more figures