syren-halofit: A fast, interpretable, high-precision formula for the $Λ$CDM nonlinear matter power spectrum

Deaglan J. Bartlett; Benjamin D. Wandelt; Matteo Zennaro; Pedro G. Ferreira; Harry Desmond

syren-halofit: A fast, interpretable, high-precision formula for the $Λ$CDM nonlinear matter power spectrum

Deaglan J. Bartlett, Benjamin D. Wandelt, Matteo Zennaro, Pedro G. Ferreira, Harry Desmond

TL;DR

The paper tackles the expensive and sometimes inaccurate prediction of the nonlinear matter power spectrum $P(k)$ in $\\Lambda$CDM. It leverages symbolic regression to derive compact analytic expressions for Halofit inputs $k_\\sigma$, $n_{\\rm eff}$, and $C$, and re-optimises Halofit coefficients to align with a broad cosmology range; it also introduces a short symbolic correction $A$ to form syren-halofit. The result is a fast, interpretable model that achieves sub-percent-level accuracy comparable to leading emulators while being thousands of times faster, validated against $N$-body simulations (e.g., Quijote) and existing emulators. This approach offers a practical, portable alternative to numerical emulators, with strong potential for incorporation into inference pipelines and future extensions to non-$\\Lambda$CDM scenarios and baryonic physics.

Abstract

Rapid and accurate evaluation of the nonlinear matter power spectrum, $P(k)$, as a function of cosmological parameters and redshift is of fundamental importance in cosmology. Analytic approximations provide an interpretable solution, yet current approximations are neither fast nor accurate relative to numerical emulators. We use symbolic regression to obtain simple analytic approximations to the nonlinear scale, $k_σ$, the effective spectral index, $n_{\rm eff}$, and the curvature, $C$, which are required for the halofit model. We then re-optimise the coefficients of halofit to fit a wide range of cosmologies and redshifts. We explore the space of analytic expressions to fit the residuals between $P(k)$ and the optimised predictions of halofit. Our results are designed to match the predictions of EuclidEmulator2, but are validated against $N$-body simulations. Our symbolic expressions for $k_σ$, $n_{\rm eff}$ and $C$ have root mean squared fractional errors of 0.8%, 0.2% and 0.3%, respectively, for redshifts below 3 and a wide range of cosmologies. The re-optimised halofit parameters reduce the root mean squared fractional error (compared to EuclidEmulator2) from 3% to below 2% for wavenumbers $k=9\times10^{-3}-9 \, h{\rm Mpc^{-1}}$. We introduce syren-halofit (symbolic-regression-enhanced halofit), an extension to halofit containing a short symbolic correction which improves this error to 1%. Our method is 2350 and 3170 times faster than current halofit and hmcode implementations, respectively, and 2680 and 64 times faster than EuclidEmulator2 (which requires running class) and the BACCO emulator. We obtain comparable accuracy to EuclidEmulator2 and BACCO when tested on $N$-body simulations. Our work greatly increases the speed and accuracy of symbolic approximations to $P(k)$, making them significantly faster than their numerical counterparts without loss of accuracy.

syren-halofit: A fast, interpretable, high-precision formula for the $Λ$CDM nonlinear matter power spectrum

TL;DR

The paper tackles the expensive and sometimes inaccurate prediction of the nonlinear matter power spectrum

CDM. It leverages symbolic regression to derive compact analytic expressions for Halofit inputs

, and

, and re-optimises Halofit coefficients to align with a broad cosmology range; it also introduces a short symbolic correction

to form syren-halofit. The result is a fast, interpretable model that achieves sub-percent-level accuracy comparable to leading emulators while being thousands of times faster, validated against

-body simulations (e.g., Quijote) and existing emulators. This approach offers a practical, portable alternative to numerical emulators, with strong potential for incorporation into inference pipelines and future extensions to non-

CDM scenarios and baryonic physics.

Abstract

Rapid and accurate evaluation of the nonlinear matter power spectrum,

, as a function of cosmological parameters and redshift is of fundamental importance in cosmology. Analytic approximations provide an interpretable solution, yet current approximations are neither fast nor accurate relative to numerical emulators. We use symbolic regression to obtain simple analytic approximations to the nonlinear scale,

, the effective spectral index,

, and the curvature,

, which are required for the halofit model. We then re-optimise the coefficients of halofit to fit a wide range of cosmologies and redshifts. We explore the space of analytic expressions to fit the residuals between

and the optimised predictions of halofit. Our results are designed to match the predictions of EuclidEmulator2, but are validated against

-body simulations. Our symbolic expressions for

and

have root mean squared fractional errors of 0.8%, 0.2% and 0.3%, respectively, for redshifts below 3 and a wide range of cosmologies. The re-optimised halofit parameters reduce the root mean squared fractional error (compared to EuclidEmulator2) from 3% to below 2% for wavenumbers

. We introduce syren-halofit (symbolic-regression-enhanced halofit), an extension to halofit containing a short symbolic correction which improves this error to 1%. Our method is 2350 and 3170 times faster than current halofit and hmcode implementations, respectively, and 2680 and 64 times faster than EuclidEmulator2 (which requires running class) and the BACCO emulator. We obtain comparable accuracy to EuclidEmulator2 and BACCO when tested on

-body simulations. Our work greatly increases the speed and accuracy of symbolic approximations to

, making them significantly faster than their numerical counterparts without loss of accuracy.

Paper Structure (14 sections, 21 equations, 8 figures, 2 tables)

This paper contains 14 sections, 21 equations, 8 figures, 2 tables.

Introduction
Theoretical background
Nonlinear matter power spectrum
Symbolic regression
Analytic approximations to halofit variables
Analytic approximation of ksigma
Analytic approximation of neff
Analytic approximation of C
Optimised halofit parameters
Corrections to halofit
Emulator peformance
Accuracy
Speed
Conclusions

Figures (8)

Figure 1: Pareto front of solutions found with operon for $k_\sigma$ (left; \ref{['eq:ksigma definition']}), $n_{\rm eff}$ (centre; \ref{['eq:neff definition']}) and $C$ (right; \ref{['eq:C definition']}) over the range of cosmologies and redshifts considered. We plot the Pareto fronts for the training and validation data separately, indicating the lengths of our preferred models with vertical dotted lines.
Figure 2: Predicted values (upper) and fractional errors (lower) for $k_\sigma$ (left; \ref{['eq:ksigma definition']}), $n_{\rm eff}$ (centre; \ref{['eq:neff definition']}) and $C$ (right; \ref{['eq:C definition']}) plotted against their true values, using the results in \ref{['eq:ksigma_fit', 'eq:neff_fit', 'eq:C_fit']}, respectively. The errors are almost always within 2% for $k_\sigma$, 0.5% for $n_{\rm eff}$ and 1% for $C$.
Figure 3: Comparison between the Takahashi_2012halofit parameters and the new optimised results (halofit+), with the bands giving the 1 and 2$\sigma$ errors, where we assume that the result of euclidemulator2 is the truth. For both training and validation we use 200 cosmologies and 100 values of $k$. The dashed horizontal lines indicate an error of $\pm$1%. The new parameters dramatically reduce the errors, particularly for $k \gtrsim 10^{-1} \, h {\rm Mpc^{-1}}$.
Figure 4: Pareto front of solutions found by operon to approximate the difference between the nonlinear matter power spectrum and the prediction of halofit. Each point on the red line represents the function with the best mean squared error on the training set for a given model length, whereas the blue curve shows the same loss for these functions evaluated on the validation set. We choose to use the model of length 44, as indicated by the vertical dotted line.
Figure 5: Distribution of fractional differences between euclidemulator2 and the prediction from halofit plus the correction given in \ref{['eq:Halofit correction']} (syren-halofit). For ease of comparison, the range of the $y$ axis is the same as \ref{['fig:new_halofit_errors']}. The bands give the 1 and $2\sigma$ values, and we find that the root mean squared fractional error is 0.9% and 1.0% for training and validation, respectively.
...and 3 more figures

syren-halofit: A fast, interpretable, high-precision formula for the $Λ$CDM nonlinear matter power spectrum

TL;DR

Abstract

syren-halofit: A fast, interpretable, high-precision formula for the $Λ$CDM nonlinear matter power spectrum

Authors

TL;DR

Abstract

Table of Contents

Figures (8)