Table of Contents
Fetching ...

Stabilizing simulation-based cosmological Fisher forecasts: a case study using the Voronoi volume function

Saee Dhawalikar, Aseem Paranjape, Shadab Alam

TL;DR

This work tackles the instability of derivative estimates in Fisher forecasts for halo-based cosmological statistics by introducing a two-step framework: random sub-sampling to stabilize noisy statistics and an optimization to select a stable, information-rich subset of data points. The authors demonstrate the method on the halo mass function and the Voronoi volume function across two N-body suites (Sinhagad and Sahyadri), showing up to a factor of ~4 improvement in constraining power and substantially better forecast stability across realizations. By defining quantitative metrics for derivative accuracy, information content, and cross-realization stability, and by using KL divergence to evaluate subset performance, the approach yields robust forecasts even with limited realizations. The framework is general and applicable to any statistic with noisy derivatives, offering a practical path to reliable, next-generation cosmological inferences for surveys like Euclid, DESI, and LSST.

Abstract

Forecasting cosmological constraints from halo-based statistics often suffers from instability in derivative estimates, especially when the number of simulations is limited. This instability reduces the reliability of Fisher forecasts and machine learning based approaches that use derivatives. We introduce a general framework that addresses this challenge by stabilizing the input statistic and then systematically identifying the optimal subset of summary statistics that maximizes cosmological information while simultaneously minimizing the instability of predicted constraints. We demonstrate this framework using the halo mass function as well as the Voronoi volume function (VVF), a summary statistic that captures beyond two-point clustering information. Applying our two-step procedure -- random sub-sampling followed by optimization -- improves the constraining power by up to a factor of 4, while also enhancing the stability of the forecasts across realizations. As surveys like Euclid, DESI, and LSST push toward tighter constraints, the ability to produce stable and accurate theoretical predictions is essential. Our results suggest that new summary statistics such as the VVF, combined with careful data curation and stabilization strategies, can play a key role in next-generation precision cosmology.

Stabilizing simulation-based cosmological Fisher forecasts: a case study using the Voronoi volume function

TL;DR

This work tackles the instability of derivative estimates in Fisher forecasts for halo-based cosmological statistics by introducing a two-step framework: random sub-sampling to stabilize noisy statistics and an optimization to select a stable, information-rich subset of data points. The authors demonstrate the method on the halo mass function and the Voronoi volume function across two N-body suites (Sinhagad and Sahyadri), showing up to a factor of ~4 improvement in constraining power and substantially better forecast stability across realizations. By defining quantitative metrics for derivative accuracy, information content, and cross-realization stability, and by using KL divergence to evaluate subset performance, the approach yields robust forecasts even with limited realizations. The framework is general and applicable to any statistic with noisy derivatives, offering a practical path to reliable, next-generation cosmological inferences for surveys like Euclid, DESI, and LSST.

Abstract

Forecasting cosmological constraints from halo-based statistics often suffers from instability in derivative estimates, especially when the number of simulations is limited. This instability reduces the reliability of Fisher forecasts and machine learning based approaches that use derivatives. We introduce a general framework that addresses this challenge by stabilizing the input statistic and then systematically identifying the optimal subset of summary statistics that maximizes cosmological information while simultaneously minimizing the instability of predicted constraints. We demonstrate this framework using the halo mass function as well as the Voronoi volume function (VVF), a summary statistic that captures beyond two-point clustering information. Applying our two-step procedure -- random sub-sampling followed by optimization -- improves the constraining power by up to a factor of 4, while also enhancing the stability of the forecasts across realizations. As surveys like Euclid, DESI, and LSST push toward tighter constraints, the ability to produce stable and accurate theoretical predictions is essential. Our results suggest that new summary statistics such as the VVF, combined with careful data curation and stabilization strategies, can play a key role in next-generation precision cosmology.

Paper Structure

This paper contains 16 sections, 12 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Illustration of VVF of the highest number density ($n=2\times 10^{-2}\,\rm{Mpc}^{-3}$) tracer sample from the Sahyadri simulation suite for different sub-boxes and choice of summary statistics. (Left panel): VVF calculated using the full tracer sample in red, and the one obtained by averaging over random sub-samples in black. The solid line shows the mean VVF, and the bands show the diagonal errors from the covariance matrices. (Right panel): Random sub-sampled (black) and full VVF (red) for three random sub-boxes, with variations in $\Omega_{\rm m}$. Each curve is divided by the VVF for the default simulation in the corresponding sub-box. Dashed (dotted) curves show the simulations with lower (higher) $\Omega_{\rm m}$. Shaded regions show the diagonal errors, same as the left panel. It is seen that even after seed matching, the VVF variations are unstable. See text for details.
  • Figure 2: Corner plots from Fisher analysis performed on the two tracer samples from Sahyadri using HMF with and without optimization. The ellipses show $95.4 \%$ confidence regions. (Top (Bottom) panels) show results obtained using the higher (lower) number density sample. Left (right) panels show results without (with) optimization. Black colour shows the true Fisher constraints obtained by averaging the derivatives across all $27$ realizations. Yellow, red, purple and cyan show one random realization of the constraints obtained using averaging across $2, 4, 6,8$ realizations respectively. The widths ($\sigma$) of the marginalized 1D distribution are quoted in corresponding colours along with the 1D distributions. The correlation coefficients are quoted alongside the 2D ellipses in the upper right corners. The KL divergence (in bits) calculated with respect to the corresponding truth is reported in the bottom corner. It is seen that optimization improves both the constraining power and stability of the results.
  • Figure 3: Comparison of Fisher constraints from multiple combinations of $16$ realizations with the accurate prediction for the mass function. Markers show the median value of KL divergence (in bits) with error bars showing the $16$th and $84$th percentiles. Darker markers show the optimized results and the lighter versions show the unoptimized counterparts. The optimized results show consistently lower KL divergence.
  • Figure 4: Scatter plots of the $X^s$ and $Y^s$ statistics for the highest number density tracer sample and two cosmological parameters. Each point represents a VVF data point, colour-coded by its $Z^s$ value, with $Z_{\rm{th}}=2$. Only points lying below the dotted lines are considered as valid data points. See Section \ref{['subsec: Results VVF']} for further details.
  • Figure 5: Normalized covariance (left) and inverse covariance matrices (right) for the two number density samples $n=2\times 10^{-2}$ (top) and $2\times 10^{-3}$ (bottom), for the optimized percentiles. It is seen that there is strong correlation between the percentiles, the higher percentiles being negatively correlated with the lower ones. These correlations are mostly limited to only neighboring percentile bins for the inverse correlation matrices, especially for the lower density sample.
  • ...and 5 more figures