Table of Contents
Fetching ...

GAME: Genetic Algorithms with Marginalised Ensembles for model-independent reconstruction of cosmological quantities

Matteo Peronaci, Matteo Martinelli, Savvas Nesseris

TL;DR

GAME extends genetic algorithms for cosmology by marginalising over ensembles of GA reconstructions to stabilize derivative-based quantities and by combining path-integral statistical errors with ensemble variance into a total uncertainty. It introduces an objective estimator S_j = $\chi^2_j$ + $\lambda R_j$ and an elbow-based L-curve procedure to weight multiple GA models, yielding a robust non-parametric reconstruction $H(z)$ and derived $w(z)$. Applied to Cosmic Chronometers data, GAME finds results consistent with $\Lambda$CDM at low redshift, with larger uncertainties at higher redshift due to data sparsity and $\Omega_{m,0}$ priors; Stage IV mocks indicate substantial improvements in the precision of $w(z)$ and the stability of derivatives. The method demonstrates that non-parametric, model-independent cosmological tests can be made more reliable and competitive for future surveys, enabling sharper discrimination between cosmological models and gravity theories.

Abstract

Genetic Algorithms (GA) are a powerful tool for stochastic optimisation and non-parametric symbolic regression, already widely used in cosmology. They are capable of reconstructing analytical functions directly from data points without introducing new physical models. A limitation of this approach is that while the reconstructed function is very efficient at reproducing the behaviour of the data points, non-observable quantities involving derivatives are particularly sensitive to stochasticity, hyperparameters, and to the choice of the best-fit function obtained by the GA, which implies the risk of the algorithm getting stuck in a local minimum. In this work we propose an update to the GA methodology for the reconstruction of analytical functions that involves computing a weighted average of an ensemble of GA configurations (\texttt{GAME}). We define the weights via a quantity that accounts for both the goodness-of-fit of the points and the smoothness of the resulting function. We also present a practical method to analytically estimate and correct the errors on the averaged function by combining a path-integral approach with an ensemble variance. We demonstrate the improvement offered by \texttt{GAME} methodology on a generic test function. We then apply the new methodology to a non-parametric reconstruction of the Hubble rate $H(z)$ using Cosmic Chronometers data and, assuming a flat Friedmann-Lemaître-Robertson-Walker background and General Relativity, we infer the corresponding dark energy equation of state $w(z)$. Through consistency tests, we show that current data produces results compatible with $Λ$CDM, and that Stage IV cosmology surveys will allow GA reinforced with \texttt{GAME} methodology to become an even more competitive tool for discriminating between different models.

GAME: Genetic Algorithms with Marginalised Ensembles for model-independent reconstruction of cosmological quantities

TL;DR

GAME extends genetic algorithms for cosmology by marginalising over ensembles of GA reconstructions to stabilize derivative-based quantities and by combining path-integral statistical errors with ensemble variance into a total uncertainty. It introduces an objective estimator S_j = + and an elbow-based L-curve procedure to weight multiple GA models, yielding a robust non-parametric reconstruction and derived . Applied to Cosmic Chronometers data, GAME finds results consistent with CDM at low redshift, with larger uncertainties at higher redshift due to data sparsity and priors; Stage IV mocks indicate substantial improvements in the precision of and the stability of derivatives. The method demonstrates that non-parametric, model-independent cosmological tests can be made more reliable and competitive for future surveys, enabling sharper discrimination between cosmological models and gravity theories.

Abstract

Genetic Algorithms (GA) are a powerful tool for stochastic optimisation and non-parametric symbolic regression, already widely used in cosmology. They are capable of reconstructing analytical functions directly from data points without introducing new physical models. A limitation of this approach is that while the reconstructed function is very efficient at reproducing the behaviour of the data points, non-observable quantities involving derivatives are particularly sensitive to stochasticity, hyperparameters, and to the choice of the best-fit function obtained by the GA, which implies the risk of the algorithm getting stuck in a local minimum. In this work we propose an update to the GA methodology for the reconstruction of analytical functions that involves computing a weighted average of an ensemble of GA configurations (\texttt{GAME}). We define the weights via a quantity that accounts for both the goodness-of-fit of the points and the smoothness of the resulting function. We also present a practical method to analytically estimate and correct the errors on the averaged function by combining a path-integral approach with an ensemble variance. We demonstrate the improvement offered by \texttt{GAME} methodology on a generic test function. We then apply the new methodology to a non-parametric reconstruction of the Hubble rate using Cosmic Chronometers data and, assuming a flat Friedmann-Lemaître-Robertson-Walker background and General Relativity, we infer the corresponding dark energy equation of state . Through consistency tests, we show that current data produces results compatible with CDM, and that Stage IV cosmology surveys will allow GA reinforced with \texttt{GAME} methodology to become an even more competitive tool for discriminating between different models.
Paper Structure (23 sections, 32 equations, 13 figures, 1 table)

This paper contains 23 sections, 32 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Evolution of the GA $\chi^2$ with the number of generations $n_{\rm gen}$. The black dotted line is the $\chi^2_{\rm threshold}$, representing the $\chi^2$ we aim to overcome, obtained through standard model-dependent minimisation of the parametrised function. The blue curve represents the $\chi^2$ of the function generated by the GA. In the legend the hyperparameters setup of the GA are listed: the random seed is 76822, the probability of crossover and mutation is $85\%$, the grammar set for the generation is made of polynomial and exponential functions, the number of generations is $N_{\rm gen}=30000$, sufficient for $\chi^2_{\rm GA}$ to drop below $\chi^2_{\rm threshold}$.
  • Figure 2: Evolution of the GA $\chi^2$ with the number of generations $n_{\rm gen}$. The black dotted line is the $\chi^2_{\rm threshold}$, representing the $\chi^2$ we aim to overcome, obtained through standard model-dependent minimisation of the parametrised function. The coloured curves represent the $\chi^2$ of the functions generated by the GA for different configurations of the hyperparameters. In the legend each hyperparameters setup of the GA are listed.
  • Figure 3: Ensemble of reconstructions obtained by running the GA across multiple hyperparameter configurations. Left: Reconstructed functions $f_{j,\rm GA}(x)$ (blue curves) compared with the fiducial model we are trying to reproduce (black line) and the mock data points with uncertainties generated on the fiducial model. Right: Corresponding derivatives $f'_{j,\rm GA}(x)$ for the same ensemble, highlighting how the spread between GA solutions grows when taking derivatives, especially near the boundaries. The collection of curves motivates a model-averaging treatment to obtain a more stable reconstructed function and derivative.
  • Figure 4: L-curve used to identify the optimal regularization parameter $\lambda$ for the estimator $S_j(\lambda) = \chi_j^2 + \lambda R_j$. Black markers represent the lower-envelope points $(R_{j_\star}, \chi^2_{j_\star})$ obtained by scanning the $\lambda$ grid and identifying the specific configuration $j_\star$ that minimizes $S_j$ at each step. These points represent the active subset of the $N_{\rm conf}$ ensemble; their limited number indicates that many $\lambda$ values map to the same optimal model $j_\star$. The blue marker identifies the elbow point, defined by the maximum curvature in the $\log R$-$\log \chi^2$ plane, where the trade-off between goodness-of-fit ($\chi^2$) and smoothness ($R$) is optimized. This determines $\lambda_{\rm elbow}$, which is subsequently used to calculate the exponential weights $w_j$ for the final model-averaging procedure.
  • Figure 5: Comparison of function reconstruction and derivative estimation between the standard GA and GAME methodologies. The bottom sub-panels display the residuals for the GA and GAME methods, respectively. Left: Reconstruction of the function $f(x)$. The solid black line indicates the underlying fiducial function (Equation \ref{['eq:testfunction']}) we are trying to reproduce from the mock dataset. The orange dashed line shows the standard best-fit GA result, while the blue dashed line represents the GAME averaged reconstruction. Shaded regions indicate $1\sigma$ confidence intervals. Right: Derivation of the derivative $f'(x)$. While the reconstructed functions $f_{\rm GA}$ and $f_{\rm GAME}$ are comparable, the derivative estimation highlights the stability of the GAME averaging at the boundaries ($x \approx 0$ and $x \approx 2$), where the standard GA deviates significantly from the fiducial line.
  • ...and 8 more figures