Table of Contents
Fetching ...

Implicit inference of the reionization history with higher-order statistics of the 21-cm signal

Nicolas Cerardi, Sambit K. Giri, Michele Bianco, Davide Piras, Emmanuel de Salis, Massimo De Santis, Merve Selcuk-Simsek, Philipp Denzel, Kelley M. Hess, M. Carmen Toribio, Franz Kirsten, Hatem Ghorbel

TL;DR

This paper tackles the challenge of reconstructing the reionization history from 21-cm tomography by focusing on the global neutral fraction $\bar{x}_{\rm HI}$ across redshift bins and exploiting non-Gaussian information. It employs simulation-based inference (SBI) with a forward model built from semi-numerical 21cmFAST simulations, realistic SKA-Low instrumental effects, and a suite of Gaussian and non-Gaussian statistics, including PS2D, Betti numbers, and the bispectrum. The key finding is that Betti numbers provide stronger constraints than two-point statistics on average, and combining PS2D with Betti (and often the bispectrum) yields substantial improvements in the figure of merit, though the bispectrum’s usefulness is state-dependent and can degrade constraints in highly neutral epochs. The work demonstrates a practical, robust pathway to maximize the scientific return of SKA-Low by integrating higher-order statistics within an SBI framework, with implications for observing strategies and data-analysis pipelines during the EoR.

Abstract

The Epoch of Reionization (EoR), when the first luminous sources ionised the intergalactic medium, represents a new frontier in cosmology. The Square Kilometre Array Observatory (SKAO) will offer unprecedented insights into this era through observations of the redshifted 21-cm signal, enabling constraints on the Universe's reionization history. We investigate the information content of the average neutral hydrogen fraction ($\bar{x}_{\rm HI}$) in several Gaussian (spherical and cylindrical power spectra) and non-Gaussian (Betti numbers and bispectrum) summary statistics of the 21-cm signal. Mock 21-cm observations are generated using the AA* configuration of SKAO's low-frequency telescope, incorporating noise levels for 100 and 1000 hours. We employ a state-of-the-art implicit inference framework to learn posterior distributions of $\bar{x}_{\rm HI}$ in redshift bins centred at $z=8.0,7.2$ and $6.5$, for each statistic and noise scenario, validating the posteriors through calibration tests. Using the figure of merit to assess constraining power, we find that Betti numbers alone are on average more informative than the power spectra, while the bispectrum provides limited constraints. However, combining higher-order statistics with the cylindrical power spectrum improves the mean figure of merit by $\sim$0.25 dex ($\sim33\%$ reduction in $σ(\bar{x}_{\rm HI})$). The relative contribution of each statistic varies with the stage of reionization. With SKAO observations approaching, our results show that combining power spectra with higher-order statistics can significantly increase the information retrieved from the EoR, maximising the scientific return of future 21-cm observations.

Implicit inference of the reionization history with higher-order statistics of the 21-cm signal

TL;DR

This paper tackles the challenge of reconstructing the reionization history from 21-cm tomography by focusing on the global neutral fraction across redshift bins and exploiting non-Gaussian information. It employs simulation-based inference (SBI) with a forward model built from semi-numerical 21cmFAST simulations, realistic SKA-Low instrumental effects, and a suite of Gaussian and non-Gaussian statistics, including PS2D, Betti numbers, and the bispectrum. The key finding is that Betti numbers provide stronger constraints than two-point statistics on average, and combining PS2D with Betti (and often the bispectrum) yields substantial improvements in the figure of merit, though the bispectrum’s usefulness is state-dependent and can degrade constraints in highly neutral epochs. The work demonstrates a practical, robust pathway to maximize the scientific return of SKA-Low by integrating higher-order statistics within an SBI framework, with implications for observing strategies and data-analysis pipelines during the EoR.

Abstract

The Epoch of Reionization (EoR), when the first luminous sources ionised the intergalactic medium, represents a new frontier in cosmology. The Square Kilometre Array Observatory (SKAO) will offer unprecedented insights into this era through observations of the redshifted 21-cm signal, enabling constraints on the Universe's reionization history. We investigate the information content of the average neutral hydrogen fraction () in several Gaussian (spherical and cylindrical power spectra) and non-Gaussian (Betti numbers and bispectrum) summary statistics of the 21-cm signal. Mock 21-cm observations are generated using the AA* configuration of SKAO's low-frequency telescope, incorporating noise levels for 100 and 1000 hours. We employ a state-of-the-art implicit inference framework to learn posterior distributions of in redshift bins centred at and , for each statistic and noise scenario, validating the posteriors through calibration tests. Using the figure of merit to assess constraining power, we find that Betti numbers alone are on average more informative than the power spectra, while the bispectrum provides limited constraints. However, combining higher-order statistics with the cylindrical power spectrum improves the mean figure of merit by 0.25 dex ( reduction in ). The relative contribution of each statistic varies with the stage of reionization. With SKAO observations approaching, our results show that combining power spectra with higher-order statistics can significantly increase the information retrieved from the EoR, maximising the scientific return of future 21-cm observations.

Paper Structure

This paper contains 17 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Redshift evolution of sky-averaged neutral fraction $\bar{x}_{\rm HI}$ of three scenarios (early, fiducial and late reionization models). The three coloured bands show the frequency bins considered in this study.
  • Figure 2: Marginal distributions on the simulation parameters $\theta$(top and middle row) and on the resulting $\bar{x}_{\rm HI}$ for three different redshifts (bottom row). y-axes show the number of sample in the dataset. A uniform sampling $p_0(\theta)$ of the astrophysical parameters (dashed histograms) give a highly bimodal distribution $p_0(\bar{x}_{\rm HI})$ of $\bar{x}_{\rm HI}$ in the three frequency bins. With our custom sampling strategy $p(\theta)$ (see Sec. \ref{['sec:sampling_prior']}), we get a relatively more balanced distribution $p(\bar{x}_{\rm HI})$ of these bins (solid histograms).
  • Figure 3: Gaussian summary statistics at $z=7.2$ for the three reference models (left to right: Early, Fiducial, Late), assuming 1000h SKA-Low noise. Top row: The spherically averaged power spectrum, $P_{1D}(k)$. Lighter lines show multiple noise realisations, and the darker line highlights a random one. The instrumental noise bias has been removed from all curves for clarity. Bottom row: The cylindrically averaged power spectrum, $P_{2D}(k_{\perp}, k_{\parallel})$, for the single realisation highlighted in the top row, again with the noise bias removed. White pixels (e.g., in the 'Early' model) indicate noise-dominated modes where bias subtraction resulted in negative values.
  • Figure 4: Non-Gaussian summary statistics at $z=7.2$ for the three reference models (Early, Fiducial, Late), assuming 1000h SKA-Low noise. Top row: The reduced equilateral bispectrum, $Q(k)$. Middle row: The reduced squeezed-limit bispectrum, $Q(k)$. Bottom row: The Betti numbers ($\beta_0, \beta_1, \beta_2$) as a function of the threshold $v$. In all panels, light lines show multiple noise realisations, and the dark lines highlight a single, common realisation.
  • Figure 5: Coverage test on the best-calibrated models for PS2D (cyan), Betti numbers (gold), PS2D+Betti (pink) and PS2D+Betti+Bispec (green). For each model we ran 20 TARP realisations and show medians as solid lines and 95% of the samples as shaded regions.
  • ...and 3 more figures