Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory
Aseem Paranjape, Ravi K. Sheth
TL;DR
This work addresses model-agnostic BAO analyses by seeking a minimal, theory-driven basis for the linear 2pcf, ξ_{ m lin}(r;\boldsymbol{\theta}). It introduces BiSequential, a neural-network architecture that factorizes the problem into a radial basis b_i(r) and cosmology-dependent weights w_i(\boldsymbol{\theta}), yielding a 9-function basis that describes ξ_{ m lin}(r) with ~0.6% accuracy across curved wCDM models near a fiducial ΛCDM. The authors demonstrate sub-percent residuals across elementary, basic, and stringent tests and show robust recovery of BAO features (peak, linear point, zero-crossing), significantly outperforming monomial bases. The approach provides a flexible, model-agnostic compression framework for BAO analyses, with potential extension to modified gravity and massive neutrino scenarios, and offers concrete pathways to integrate with data covariances via orthogonalization if needed. Overall, the paper presents a practical ML-driven method to compress and generalize the linear 2pcf across a broad cosmological landscape, enabling more powerful, model-independent BAO inferences.
Abstract
We consider approximating the linearly evolved 2-point correlation function (2pcf) of dark matter $ξ_{\rm lin}(r;\boldsymbolθ)$ in a cosmological model with parameters $\boldsymbolθ$ as the linear combination $ξ_{\rm lin}(r;\boldsymbolθ)\approx\sum_i\,b_i(r)\,w_i(\boldsymbolθ)$, where the functions $\mathcal{B}=\{b_i(r)\}$ form a $\textit{model-agnostic basis}$ for the linear 2pcf. This decomposition is important for model-agnostic analyses of the baryon acoustic oscillation (BAO) feature in the nonlinear 2pcf of galaxies that fix $\mathcal{B}$ and leave the coefficients $\{w_i\}$ free. To date, such analyses have made simple but sub-optimal choices for $\mathcal{B}$, such as monomials. We develop a machine learning framework for systematically discovering a $\textit{minimal}$ basis $\mathcal{B}$ that describes $ξ_{\rm lin}(r)$ near the BAO feature in a wide class of cosmological models. We use a custom architecture, denoted $\texttt{BiSequential}$, for a neural network (NN) that explicitly realizes the separation between $r$ and $\boldsymbolθ$ above. The optimal NN trained on data in which only $\{Ω_{\rm m},h\}$ are varied in a $\textit{flat}$ $Λ$CDM model produces a basis $\mathcal{B}$ comprising $9$ functions capable of describing $ξ_{\rm lin}(r)$ to $\sim0.6\%$ accuracy in $\textit{curved}$ $w$CDM models varying 7 parameters within $\sim5\%$ of their fiducial, flat $Λ$CDM values. Scales such as the peak, linear point and zero-crossing of $ξ_{\rm lin}(r)$ are also recovered with very high accuracy. We compare our approach to other compression schemes in the literature, and speculate that $\mathcal{B}$ may also encompass $ξ_{\rm lin}(r)$ in modified gravity models near our fiducial $Λ$CDM model. Using our basis functions in model-agnostic BAO analyses can potentially lead to significant statistical gains.
