Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

J. Bayron Orjuela-Quintana; Domenico Sapone; Savvas Nesseris

Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

J. Bayron Orjuela-Quintana, Domenico Sapone, Savvas Nesseris

TL;DR

The results provide compact, accurate, and physically motivated fitting functions for the linear MPS in both standard and MG cosmologies, offering a fast and transparent alternative to existing emulators for parameter inference and theoretical modeling in large-scale structure analyses.

Abstract

We present an interpretable emulator for the linear matter power spectrum (MPS) in the standard cosmological model $Λ$CDM, constructed via a physics-informed symbolic regression framework. By combining domain knowledge with a machine learning technique known as genetic algorithms, we explore the space of analytic expressions to derive closed-form, smooth, physically motivated approximations of the MPS that match the accuracy of standard broadband reconstruction methodologies such as the Savitzky-Golay filter. Building upon this baseline, we incorporate transparent oscillatory corrections informed by the physics of baryon acoustic oscillations (BAO). The resulting expression delivers mean sub-percent fractional errors across a broad range of scales ($k \in [10^{-5}, 1.5]~h\,\mathrm{Mpc}^{-1}$) with an average deviation of $\sim 0.4\%$ when tested against spectra computed with a Boltzmann solver. Moreover, a comparable level of fractional deviation is maintained on smaller scales when the GA-derived formulation is used as input to the nonlinear emulator halofit. To illustrate the versatility of the framework beyond $Λ$CDM, we apply it to a representative $f(R)$ gravity model. Rather than training a general modified-gravity emulator, we compute the corresponding linear spectra with a Boltzmann solver and fit a parametric deformation of the $Λ$CDM smoothed component. This procedure achieves average errors at the 1.5-1.8\% level and captures the leading modulation of the MPS induced by modified gravity, enabling a controlled study of its impact on the BAO scale. Our results provide compact, accurate, and physically motivated fitting functions for the linear MPS in both standard and MG cosmologies, offering a fast and transparent alternative to existing emulators for parameter inference and theoretical modeling in large-scale structure analyses.

Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

TL;DR

Abstract

We present an interpretable emulator for the linear matter power spectrum (MPS) in the standard cosmological model

CDM, constructed via a physics-informed symbolic regression framework. By combining domain knowledge with a machine learning technique known as genetic algorithms, we explore the space of analytic expressions to derive closed-form, smooth, physically motivated approximations of the MPS that match the accuracy of standard broadband reconstruction methodologies such as the Savitzky-Golay filter. Building upon this baseline, we incorporate transparent oscillatory corrections informed by the physics of baryon acoustic oscillations (BAO). The resulting expression delivers mean sub-percent fractional errors across a broad range of scales (

) with an average deviation of

when tested against spectra computed with a Boltzmann solver. Moreover, a comparable level of fractional deviation is maintained on smaller scales when the GA-derived formulation is used as input to the nonlinear emulator halofit. To illustrate the versatility of the framework beyond

CDM, we apply it to a representative

gravity model. Rather than training a general modified-gravity emulator, we compute the corresponding linear spectra with a Boltzmann solver and fit a parametric deformation of the

CDM smoothed component. This procedure achieves average errors at the 1.5-1.8\% level and captures the leading modulation of the MPS induced by modified gravity, enabling a controlled study of its impact on the BAO scale. Our results provide compact, accurate, and physically motivated fitting functions for the linear MPS in both standard and MG cosmologies, offering a fast and transparent alternative to existing emulators for parameter inference and theoretical modeling in large-scale structure analyses.

Paper Structure (37 sections, 122 equations, 15 figures, 13 tables)

This paper contains 37 sections, 122 equations, 15 figures, 13 tables.

Introduction
Symbolic Regression and Genetic Algorithms
The Linear Matter Power Spectrum
Calculation of Matter Power Spectra
Emulator for the $P(k)$ of $\Lambda$CDM
The De-Wiggled Matter Power Spectrum
Training Data
Template for the GA
Fitting Formula
Test
Wiggles in the Matter Transfer Function
Template Function
Fitting Formula
Test
Corrections around $k_\text{eq}$
...and 22 more sections

Figures (15)

Figure 1: Left: Evolution of the fitness function across $10^4$ generations for 100 different GA runs initialized with different random seeds. Right: Fitness evolution for the best-performing run extended to $10^5$ generations. The plateau indicates stagnation in the optimization process.
Figure 2: Top:$\text{MAPE}$ as a function of $k$ for the reconstructed $P_{\text{GA}, {\text{nw}}}$ across 200 cosmologies sampled in a LH from Table. \ref{['Tab: Params']}. Thin gray lines represent individual models; the best and worst cases are highlighted in color. Our formula maintains better than 1% accuracy across the full range, except for $k \sim 0.01\!-\!0.3~h\,\text{Mpc}^{-1}$, where BAO features dominate. Bottom: Distribution of the fractional errors corresponding to the $1\sigma$ and $2\sigma$ regions. Dashed black lines represent the $1\%$ deviation region.
Figure 3: Fitness as a function of wavenumber $k$ over the training set. Left: Fitness without correction. Errors peak near $k \sim 0.02~h\,\text{Mpc}^{-1}$ (equality scale) and at $k \sim 0.2~h\,\text{Mpc}^{-1}$ (diffusion scale). The latter feature appears largely independent of the cosmological parameters. Right: Fitness after applying a Gaussian correction around $k \sim 0.2~h\,\text{Mpc}^{-1}$, improving the overall fit to $\text{MAPE}(\text{GA}) = 0.25\%$.
Figure 4: Point-wise fractional error $\text{MAPE}(k)$ of the emulated linear MPS. Top left: Original model $P_\text{GA}$ achieving $\text{MAPE}(\text{GA}) = 0.42\%$. Larger errors are observed around $k \sim 0.02~h\,\text{Mpc}^{-1}$ and $k \sim 0.1~h\,\text{Mpc}^{-1}$. Top right: Improved version including three localized Gaussian corrections reduces the mean fractional error to $\text{MAPE}(\text{GA}) = 0.39\%$, significantly improving the match around the Silk damping scale. Bottom panels show the distribution of the fractional errors corresponding to the $1\sigma$ and $2\sigma$ regions considering no corrections around the Silk scale (bottom left), and considering Gaussian corrections around this scale (bottom right).
Figure 5: Best-fit (green) and worst-fit (red) examples from the 200 test cosmologies. The worst case fails to match the location and amplitude of the peak at $k_\text{max}$, as indicated by the CLASS prediction (black dashed line).
...and 10 more figures

Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

TL;DR

Abstract

Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

Authors

TL;DR

Abstract

Table of Contents

Figures (15)