Table of Contents
Fetching ...

Primordial non-Gaussianity -- Fast simulations and persistent summary statistics

Juan Calles, Gabriella Contardo, Jorge Noreña, Jacky H. T. Yip, Gary Shiu

TL;DR

This work probes how topological descriptors from persistent homology and traditional clustering statistics constrain primordial non-Gaussianity (PNG) in large-scale structure, using two simulation suites (PNG-pmwd and QuijotePNG) and likelihood-free neural regression. The authors introduce PNG-pmwd with 22,410 halo catalogs across local and equilateral PNG shapes and varied cosmology, enabling a broad comparison of statistics across halo-mass bins. They find that PD-statistics, a simple topological descriptor, typically yields the strongest constraints for both $f_{ m NL}^{\rm loc}$ and $f_{ m NL}^{\rm equil}$, with large halos carrying most of the information; including small halos or small scales can degrade performance and hinder transferability between simulators. Transferability tests reveal that models trained on fast simulations can generalize to full simulations only when small-scale modes and low-mass halos are omitted, highlighting the need for careful handling of resolution differences and standardization when applying learned mappings to different datasets.

Abstract

We investigate the sensitivity of topological and traditional summary statistics to primordial non-Gaussianity (PNG) using two suites of simulations. First, we introduce a new simulation suite for PNG, PNG-pmwd, comprising more than $20{,}000$ halo catalogs that vary individually local and equilateral shapes, together with variations in $Ω_m$ and $σ_8$. Second, we carry out a systematic comparison of topological descriptors, as well as powerspectrum and bispectrum measurements, evaluating their constraining power on both local and equilateral $f_{\rm NL}$ and how this sensitivity varies with halo mass. This dataset enables likelihood-free neural regression of $f_{\rm NL}$ across multiple halo mass bins for a wide range of summary statistics. Third, we assess the transferability of these learned mappings by testing whether models trained on fast pmwd simulations can robustly infer on simulations from the QuijotePNG suite. We find that a combination of simple descriptive statistics of the topological features (PD-statistics) leads to the best performance to constrain equilateral PNG. We observe that the constraining power of these summaries comes from large-mass halos, with small-mass halos adding noise and degrading performance. Similarly, we find that the transferability of the learned mappings, for both topological and powerspectrum plus bispectrum, degrades if small scales or small-mass halos are included.

Primordial non-Gaussianity -- Fast simulations and persistent summary statistics

TL;DR

This work probes how topological descriptors from persistent homology and traditional clustering statistics constrain primordial non-Gaussianity (PNG) in large-scale structure, using two simulation suites (PNG-pmwd and QuijotePNG) and likelihood-free neural regression. The authors introduce PNG-pmwd with 22,410 halo catalogs across local and equilateral PNG shapes and varied cosmology, enabling a broad comparison of statistics across halo-mass bins. They find that PD-statistics, a simple topological descriptor, typically yields the strongest constraints for both and , with large halos carrying most of the information; including small halos or small scales can degrade performance and hinder transferability between simulators. Transferability tests reveal that models trained on fast simulations can generalize to full simulations only when small-scale modes and low-mass halos are omitted, highlighting the need for careful handling of resolution differences and standardization when applying learned mappings to different datasets.

Abstract

We investigate the sensitivity of topological and traditional summary statistics to primordial non-Gaussianity (PNG) using two suites of simulations. First, we introduce a new simulation suite for PNG, PNG-pmwd, comprising more than halo catalogs that vary individually local and equilateral shapes, together with variations in and . Second, we carry out a systematic comparison of topological descriptors, as well as powerspectrum and bispectrum measurements, evaluating their constraining power on both local and equilateral and how this sensitivity varies with halo mass. This dataset enables likelihood-free neural regression of across multiple halo mass bins for a wide range of summary statistics. Third, we assess the transferability of these learned mappings by testing whether models trained on fast pmwd simulations can robustly infer on simulations from the QuijotePNG suite. We find that a combination of simple descriptive statistics of the topological features (PD-statistics) leads to the best performance to constrain equilateral PNG. We observe that the constraining power of these summaries comes from large-mass halos, with small-mass halos adding noise and degrading performance. Similarly, we find that the transferability of the learned mappings, for both topological and powerspectrum plus bispectrum, degrades if small scales or small-mass halos are included.

Paper Structure

This paper contains 33 sections, 14 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Impact of varying the number of nearest neighbors in the $\alpha$-DTM-$\ell$ filtration, shown for the fiducial cosmology and a fixed seed. Black dots indicate halo positions in redshift space, with the size of the dots proportional to the mass of the halo; green circles denote radii computed via Eq. \ref{['eq:dtmradii']}. Dark blue lines represent 1-simplices (edges), while light blue shaded regions correspond to 2-simplices (triangle faces) present at this filtration scale.
  • Figure 2: We show the simplicial complex in a slice from sub-boxes of size $350$ to $650\,h^{-1}\mathrm{Mpc}$ in the LH_LC300 dataset of the PNG-pmwd suite with $f_{\rm NL}^{\rm loc} = 294.45$. Each panel represent a different mass cut. The arrow indicates the mass binning scheme used in Table \ref{['tab:mass_bins']} in units of $[10^{13}\,\rm{M}_\odot/h]$. Color coding correspond to groups with similar mean halo density.
  • Figure 3: Cosmological models guide LSS N-body simulations, from which we identify dark matter halo catalogs. Persistent homology extracts topological features as persistence diagrams, which are vectorized or directly feed into machine learning models to infer cosmological parameters.
  • Figure 4: A MLP trained on PSBS vector from the PNG-pmwdLH_LC300 training split and evaluated on both PNG-pmwd (in blue) and QuijotePNG (in orange) test sets to infer $f_{\rm NL}^{\rm loc}$ in the HMid mass bin. Each panel shows the effect of including higher-wavelength bispectrum configurations, where all triangles containing a side with $k > k_{\rm max}$ are ignored. As $k_{\rm max}$ increases, predictions on the native test set become tighter due to the inclusion of more squeezed triangles, but robustness across simulations decreases.
  • Figure 5: Same as Figure \ref{['fig:PSBS_MLP_klambda']} but evaluated in the HHigh mass bin.
  • ...and 8 more figures