Table of Contents
Fetching ...

Indicator Functions: Distilling the Information from Gaussian Random Fields

Andrew Repp, Ravi K. Sheth, Istvan Szapudi, Yan-Chuan Cai

TL;DR

This paper tackles the problem that Fisher information on the amplitude of the power spectrum in a Gaussian random field is finite but not evenly distributed after smoothing. It introduces indicator functions to partition the field by density and derives analytic expressions for the information in the corresponding indicator correlations $\xi_I(r)$, focusing on the $r$-range $[60,80)\,h^{-1}\mathrm{Mpc}$ and identifying that most information resides in moderately rare, high-density regions. The authors show that, for finite surveys, the information in $\xi_I$ can exceed that in the standard two-point function $\xi(r)$ and provide practical expressions for the Fisher information $\mathcal{I}_{A_z}$ on the power-spectrum amplitude, including a low-probability limit. These results offer a principled route to optimize sampling via density-based statistics (density-split/mark statistics) and have implications for robust BAO amplitude measurements and efficient cosmological inference.

Abstract

A random Gaussian density field contains a fixed amount of Fisher information on the amplitude of its power spectrum. For a given smoothing scale, however, that information is not evenly distributed throughout the smoothed field. We investigate which parts of the field contain the most information by smoothing and splitting the field into different levels of density (using the formalism of indicator functions), deriving analytic expressions for the information content of each density bin in the joint-probability distribution (given a distance separation). When we choose one particular distance regime (i.e., cells separated by $60$-$80h^{-1}$ Mpc), we find that the information in that range peaks at moderately rare densities (where the number of smoothed survey cells is roughly of order of magnitude 100). Counter-intuitively, we find that, for a finite survey volume (again at a particular distance range), indicator function analysis can outperform conventional two-point statistics while using only a fraction of the total survey cells, and we explain why. In light of recent developments in marked statistics (such as the indicator power spectrum and density-split clustering), this result elucidates how to optimize sampling for effective extraction of cosmological information.

Indicator Functions: Distilling the Information from Gaussian Random Fields

TL;DR

This paper tackles the problem that Fisher information on the amplitude of the power spectrum in a Gaussian random field is finite but not evenly distributed after smoothing. It introduces indicator functions to partition the field by density and derives analytic expressions for the information in the corresponding indicator correlations , focusing on the -range and identifying that most information resides in moderately rare, high-density regions. The authors show that, for finite surveys, the information in can exceed that in the standard two-point function and provide practical expressions for the Fisher information on the power-spectrum amplitude, including a low-probability limit. These results offer a principled route to optimize sampling via density-based statistics (density-split/mark statistics) and have implications for robust BAO amplitude measurements and efficient cosmological inference.

Abstract

A random Gaussian density field contains a fixed amount of Fisher information on the amplitude of its power spectrum. For a given smoothing scale, however, that information is not evenly distributed throughout the smoothed field. We investigate which parts of the field contain the most information by smoothing and splitting the field into different levels of density (using the formalism of indicator functions), deriving analytic expressions for the information content of each density bin in the joint-probability distribution (given a distance separation). When we choose one particular distance regime (i.e., cells separated by - Mpc), we find that the information in that range peaks at moderately rare densities (where the number of smoothed survey cells is roughly of order of magnitude 100). Counter-intuitively, we find that, for a finite survey volume (again at a particular distance range), indicator function analysis can outperform conventional two-point statistics while using only a fraction of the total survey cells, and we explain why. In light of recent developments in marked statistics (such as the indicator power spectrum and density-split clustering), this result elucidates how to optimize sampling for effective extraction of cosmological information.

Paper Structure

This paper contains 13 sections, 45 equations, 2 figures.

Figures (2)

  • Figure 1: Information on power spectrum amplitude (linear and log scales) from indicator correlation functions $\xi_I(r)$ in the distance bin $r \in [60, 80) h^{-1}$Mpc, from Gaussian realizations, for various densities $\nu = \delta/\sigma$. The left-hand panel shows results from a cube of 500$h^{-1}$ Mpc per side, divided into $32^3$-cells; the right-hand panel, from a cube of 1000$h^{-1}$Mpc per side, divided into $64^3$ cells. Both panels compare calculated information to the predictions of Equations \ref{['eq:biginfoeq']} (blue) and \ref{['eq:twovalapprox']} (orange). Dashed portions of the curves indicate values of $\nu$ outside the formulas' applicability range. We derive the purple points by numerically differentiating a Gaussian approximation to the observed probability distribution of $\hat{\xi}_I$ (i.e., a continuous normal pdf with mean and variance given by $\langle \hat{\xi}_I\rangle$ and $\langle \hat{\xi}_I^2\rangle - \langle \hat{\xi}_I\rangle^2$). We derive the green points by binning the observed values of $\hat{\xi}_I$, taking the occurrence rate of each bin as the pdf, which we then numerically differentiate. See text (Section \ref{['sec:infotestmeas']} and Appendix \ref{['sec:appendix']}) for more information. The upper axis shows the expected number of survey cells in each bin. (Note that our use of 10,000 realizations allows us to resolve a mean number of cells down to $10^{-4}$.) Both panels also show the information from the full correlation function $\xi(r)$ for this particular distance bin.
  • Figure 2: Constraints on the value of $\sigma^2$ (proxy for amplitude $A_z$) deduced from indicator function correlations ($\xi_I$, blue) and the full correlation function ($\xi$, orange), in the set of $64^3$-cell realizations described above, in the radial distance bin $[60, 80)h^{-1}$ Mpc. Dashed blue and orange lines show mean deduced values, and the shading shows one standard deviation on either side of the mean; the magenta dashed line shows the actual value of ($\sigma^2=0.625$) employed to generate the realizations. Left panel: $\nu = \delta/\sigma \approx -4.0$; right panel: $\nu = \delta/\sigma \approx -2.8$. (Since our field is Gaussian, the sign of $\nu$ does not matter; cf. Equation \ref{['eq:sminfoeq']}, which depends on $\nu^2$ only.)