Table of Contents
Fetching ...

Sample Variance Denoising in Cylindrical 21-cm Power Spectra

Daniela Breitman, Andrei Mesinger, Steven G. Murray, Anshuman Acharya

TL;DR

This work tackles sample variance in 21-cm power spectrum analyses arising from small forward-model volumes and anisotropic cylindrical k-space footprints. It introduces 21cmPSDenoiser, a score-based diffusion denoiser that predicts the IC-averaged mean 2D PS $oldsymbol{bmu}( ilde{ heta})$ from a single realisation by learning the score $ abla_{oldsymbol{x}} \, \log P_t(oldsymbol{x}|oldsymbol{x}_i)$ and integrating a reverse SDE via a probability-flow ODE. The method is model-agnostic and generalizes across simulators; when combined with a cylindrical wedge cut at $bumin_{ m min}=0.97$, it yields unbiased posteriors that are roughly 50% narrower than traditional pipelines. In a realistic mock HERA inference, 21cmPSDenoiser outperforms Fixing & Pairing while offering substantial reductions in large-scale sample variance at minimal computational cost (≈6 s per PS), enabling more precise and reliable cosmological constraints from upcoming 21-cm data.

Abstract

State-of-the-art simulations of reionisation-era 21-cm signal have limited volumes, generally orders of magnitude smaller than observations. Consequently, the Fourier modes in common between simulation and observation have limited overlap, especially in cylindrical (2D) k-space that is natural for 21-cm interferometry. This makes sample variance (i.e. the deviation of the simulated sample from the population mean due to finite box size) a potential issue when interpreting upcoming 21-cm observations. We introduce \texttt{21cmPSDenoiser}, a score-based diffusion model that can be applied to a single, forward-modelled realisation of the 21-cm 2D power spectrum (PS), predicting the corresponding \textit{population mean} on-the-fly during Bayesian inference. Individual samples of 2D Fourier amplitudes of wave modes relevant to current 21-cm observations can deviate from the mean by over 50\% for 300 cMpc simulations, even when only considering stochasticity due to sampling of Gaussian initial conditions. \texttt{21cmPSDenoiser} reduces this deviation by an order of magnitude, outperforming current state-of-the-art sample variance mitigation techniques like Fixing \& Pairing by a factor of few at almost no additional computational cost ($\sim6$s per PS). Unlike emulators, the denoiser is not tied to a particular model or simulator since its input is a (model-agnostic) realisation of the 2D 21-cm PS. Indeed, we confirm that it generalises to PS produced with a different 21-cm simulator than those on which it was trained. To quantify the improvement in parameter recovery, we simulate a 21-cm PS detection by the Hydrogen Epoch of Reionization Arrays (HERA) and run different inference pipelines corresponding to commonly-used approximations. We find that using \texttt{21cmPSDenoiser} in the inference pipeline outperforms other approaches, yielding an unbiased posterior that is 50\% narrower.

Sample Variance Denoising in Cylindrical 21-cm Power Spectra

TL;DR

This work tackles sample variance in 21-cm power spectrum analyses arising from small forward-model volumes and anisotropic cylindrical k-space footprints. It introduces 21cmPSDenoiser, a score-based diffusion denoiser that predicts the IC-averaged mean 2D PS from a single realisation by learning the score and integrating a reverse SDE via a probability-flow ODE. The method is model-agnostic and generalizes across simulators; when combined with a cylindrical wedge cut at , it yields unbiased posteriors that are roughly 50% narrower than traditional pipelines. In a realistic mock HERA inference, 21cmPSDenoiser outperforms Fixing & Pairing while offering substantial reductions in large-scale sample variance at minimal computational cost (≈6 s per PS), enabling more precise and reliable cosmological constraints from upcoming 21-cm data.

Abstract

State-of-the-art simulations of reionisation-era 21-cm signal have limited volumes, generally orders of magnitude smaller than observations. Consequently, the Fourier modes in common between simulation and observation have limited overlap, especially in cylindrical (2D) k-space that is natural for 21-cm interferometry. This makes sample variance (i.e. the deviation of the simulated sample from the population mean due to finite box size) a potential issue when interpreting upcoming 21-cm observations. We introduce \texttt{21cmPSDenoiser}, a score-based diffusion model that can be applied to a single, forward-modelled realisation of the 21-cm 2D power spectrum (PS), predicting the corresponding \textit{population mean} on-the-fly during Bayesian inference. Individual samples of 2D Fourier amplitudes of wave modes relevant to current 21-cm observations can deviate from the mean by over 50\% for 300 cMpc simulations, even when only considering stochasticity due to sampling of Gaussian initial conditions. \texttt{21cmPSDenoiser} reduces this deviation by an order of magnitude, outperforming current state-of-the-art sample variance mitigation techniques like Fixing \& Pairing by a factor of few at almost no additional computational cost (s per PS). Unlike emulators, the denoiser is not tied to a particular model or simulator since its input is a (model-agnostic) realisation of the 2D 21-cm PS. Indeed, we confirm that it generalises to PS produced with a different 21-cm simulator than those on which it was trained. To quantify the improvement in parameter recovery, we simulate a 21-cm PS detection by the Hydrogen Epoch of Reionization Arrays (HERA) and run different inference pipelines corresponding to commonly-used approximations. We find that using \texttt{21cmPSDenoiser} in the inference pipeline outperforms other approaches, yielding an unbiased posterior that is 50\% narrower.

Paper Structure

This paper contains 15 sections, 3 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Flowchart comparing the current state-of-the-art inference pipeline (left side) with this work (right side). In current state-of-the-art pipelines, we use a single realisation of the initial conditions to estimate the mean 1D 21-cm PS. Moreover, the simulated 1D PS is computed by spherically averaging over different wave modes than those used to compute the observed 1D PS. In this work, we account for sample variance by applying 21cmPSDenoiser a score-based diffusion model trained to estimate the mean 21-cm PS from a single realisation. We also account for 21-cm PS anisotropy by averaging the 2D PS only over modes above $\mu_{\rm min} = 0.97$, the region of 2D PS that is closest to where current 21-cm PS instruments observe. Applying a cut in $\mu_{\rm min}$ significantly exacerbates the problem of sample variance, and would not be practical without the use of 21cmPSDenoiser.
  • Figure 2: Cylindrically-averaged (2D) 21-cm power spectrum as a function of line-of-sight modes $k_\parallel$ and sky-plane modes $k_\perp$. The color map shows the PS amplitude calculated from a slice through a single simulated light cone centred at $z=9$. The simulation box has a side length of 300 cMpc and was generated with 21cmFASTv3. The blue hashed area is the HERA EoR window. The dashed cyan line is the horizon limit and the black solid line is the horizon limit with a 300 ns buffer added to it to account for additional foreground leakage (see HERA23). The red solid line is drawn at a value of $\mu_{\rm min} = 0.97$ where $\mu_{\rm min} = \cos \theta$, and $\tan \theta = \frac{k_\perp}{k_\parallel}$. In this paper, we use the red line as a rough approximation for the solid black line.
  • Figure 3: An illustration of the forward and backward diffusion processes used in 21cmPSDenoiser. The forward process adds noise to a mean 21-cm 2D PS sampled from the data distribution ( leftmost panel), transforming it into a pre-defined Gaussian prior distribution ( rightmost panel). We can then write the reverse process that allows us to sample the Gaussian prior and generate a mean 2D PS, conditioned on the input realisation of the 2D PS.
  • Figure 4: Top row: PS sample on the left and NN mean estimate from this same PS sample as input on the right. Middle row: fractional error with respect to the mean PS obtained from an ensemble average of about 200 PS realisations for the sample (left) and 21cmPSDenoiser (right). Bottom row: error as a fraction of the HERA noise level at the same redshift for the sample (left) and for the 21cmPSDenoiser (right).
  • Figure 5: Median fractional error on $\sim 2.5$k test samples at redshift $z \sim 11.4$. The left plot evaluates the FE directly on the PS realisations, while the right plot evaluates it on the output from 21cmPSDenoiser. The striped pattern in the right plot occurs due to the binning scheme, where certain bins have more samples (and thus less sample variance) than others.
  • ...and 5 more figures