Table of Contents
Fetching ...

Exploring the Early Universe with Deep Learning

Emmanuel de Salis, Massimo De Santis, Davide Piras, Sambit K. Giri, Michele Bianco, Nicolas Cerardi, Philipp Denzel, Merve Selcuk-Simsek, Kelley M. Hess, M. Carmen Toribio, Franz Kirsten, Hatem Ghorbel

TL;DR

The paper tackles recovering the Epoch of Reionization history from SKA-Low 21-cm 2D power spectra by training and evaluating a broad set of deep learning models on realistic simulations. It generates a dataset of $15{,}945$ realizations with six astrophysical nuisance parameters using 21cmFAST, computing the 2D power spectrum $P(k_ot,k_\parallel)$ across three frequency bands and targeting the volume-averaged neutral fraction $ar{x}_{HI}$. A diverse toolbox—Generative Flow Networks, SE-CNNs (including ensembles), MLP-Mixer, MiniViT, Frequency-Aware CNN, and SBI-based neural posterior estimation—is benchmarked with $R^2$ and RMSE as metrics. The results show near-top performance for SE-CNN Ensemble-10 and MLP-Mixer (e.g., $R^2$ up to 0.9861), with joint SBI leveraging multi-frequency information to approach CNN-based accuracy, supporting scalable, robust SKA data pipelines for early-universe inference.

Abstract

Hydrogen is the most abundant element in our Universe. The first generation of stars and galaxies produced photons that ionized hydrogen gas, driving a cosmological event known as the Epoch of Reionization (EoR). The upcoming Square Kilometre Array Observatory (SKAO) will map the distribution of neutral hydrogen during this era, aiding in the study of the properties of these first-generation objects. Extracting astrophysical information will be challenging, as SKAO will produce a tremendous amount of data where the hydrogen signal will be contaminated with undesired foreground contamination and instrumental systematics. To address this, we develop the latest deep learning techniques to extract information from the 2D power spectra of the hydrogen signal expected from SKAO. We apply a series of neural network models to these measurements and quantify their ability to predict the history of cosmic hydrogen reionization, which is connected to the increasing number and efficiency of early photon sources. We show that the study of the early Universe benefits from modern deep learning technology. In particular, we demonstrate that dedicated machine learning algorithms can achieve more than a $0.95$ $R^2$ score on average in recovering the reionization history. This enables accurate and precise cosmological and astrophysical inference of structure formation in the early Universe.

Exploring the Early Universe with Deep Learning

TL;DR

The paper tackles recovering the Epoch of Reionization history from SKA-Low 21-cm 2D power spectra by training and evaluating a broad set of deep learning models on realistic simulations. It generates a dataset of realizations with six astrophysical nuisance parameters using 21cmFAST, computing the 2D power spectrum across three frequency bands and targeting the volume-averaged neutral fraction . A diverse toolbox—Generative Flow Networks, SE-CNNs (including ensembles), MLP-Mixer, MiniViT, Frequency-Aware CNN, and SBI-based neural posterior estimation—is benchmarked with and RMSE as metrics. The results show near-top performance for SE-CNN Ensemble-10 and MLP-Mixer (e.g., up to 0.9861), with joint SBI leveraging multi-frequency information to approach CNN-based accuracy, supporting scalable, robust SKA data pipelines for early-universe inference.

Abstract

Hydrogen is the most abundant element in our Universe. The first generation of stars and galaxies produced photons that ionized hydrogen gas, driving a cosmological event known as the Epoch of Reionization (EoR). The upcoming Square Kilometre Array Observatory (SKAO) will map the distribution of neutral hydrogen during this era, aiding in the study of the properties of these first-generation objects. Extracting astrophysical information will be challenging, as SKAO will produce a tremendous amount of data where the hydrogen signal will be contaminated with undesired foreground contamination and instrumental systematics. To address this, we develop the latest deep learning techniques to extract information from the 2D power spectra of the hydrogen signal expected from SKAO. We apply a series of neural network models to these measurements and quantify their ability to predict the history of cosmic hydrogen reionization, which is connected to the increasing number and efficiency of early photon sources. We show that the study of the early Universe benefits from modern deep learning technology. In particular, we demonstrate that dedicated machine learning algorithms can achieve more than a score on average in recovering the reionization history. This enables accurate and precise cosmological and astrophysical inference of structure formation in the early Universe.

Paper Structure

This paper contains 17 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Schematic representation of our inference pipeline for one of the three frequency ranges, $\nu_\mathrm{obs}\pm \Delta\nu$ as explained in §\ref{['sec:data']}.
  • Figure 2: 2D power spectra of the cosmological 21-cm signal measured at the three different observed frequency ranges for one model in our dataset. On top of each panel, we show the corresponding volume-averaged neutral fraction, $\overline{x}_\mathrm{HI}$.
  • Figure 3: