Table of Contents
Fetching ...

A Gigaparsec-Scale Hydrodynamic Volume Reconstructed with Deep Learning

Cooper Jacobus, Roger de Belsunce, Solene Chabanier, Peter Harrington, JD Emberson, Zarija Lukić, Salman Habib

TL;DR

The paper tackles the challenge of generating DESI-scale Ly-$\alpha$ forest mocks that preserve small-scale IGM physics within a Gigaparsec-scale volume. It implements a hybrid pipeline that couples a low-resolution hydrodynamic simulation with a conditional GAN to reconstruct sub-kpc baryonic structure, yielding a ~1~Gpc-diameter volume with P1D accuracy better than $\sim10\%$ and P3D accuracy around $\sim20\%$ relative to high-resolution references. A theoretical EFT framework validates the reconstruction, enabling joint fits to $P(k,\mu)$ and extraction of bias parameters, with the higher-order EFT terms showing consistency across resolutions while linear biases remain sensitive to large-scale coverage. The work demonstrates a practical route to survey-scale, high-fidelity mock skies, publicly releasing the hydrodynamic volume and companion halo catalog and outlining future hybridizations with perturbation theory to recover the largest-scale modes.

Abstract

The next generation of cosmological spectroscopic sky surveys will probe the distribution of matter across several Gigaparsecs (Gpc) or many billion light-years. In order to leverage the rich data in these new maps to gain a better understanding of the physics that shapes the large-scale structure of the cosmos, observed matter distributions must be compared to simulated mock skies. Small mock skies can be produced using precise, physics-driven hydrodynamical simulations. However, the need to capture small, kpc-scale density fluctuations in the intergalactic medium (IGM) places tight restrictions on the necessary minimum resolution of these simulations. Even on the most powerful supercomputers, it is impossible to run simulations of such high resolution in volumes comparable to what will be probed by future surveys, due to the vast quantity of data needed to store such a simulation in computer memory. However, it is possible to represent the essential features of these high-resolution simulations using orders of magnitude less memory. We present a hybrid approach that employs a physics-driven hydrodynamical simulation at a much lower-than-necessary resolution, followed by a data-driven, deep-learning Enhancement. This hybrid approach allows us to produce hydrodynamic mock skies that accurately capture small, kpc-scale features in the IGM but which span hundreds of Megaparsecs. We have produced such a volume which is roughly one Gigaparsec in diameter and examine its relevant large-scale statistical features, emphasizing certain properties that could not be captured by previous smaller simulations. We present this hydrodynamic volume as well as a companion n-body dark matter simulation and halo catalog which we are making publically available to the community for use in calibrating data pipelines for upcoming survey analyses.

A Gigaparsec-Scale Hydrodynamic Volume Reconstructed with Deep Learning

TL;DR

The paper tackles the challenge of generating DESI-scale Ly- forest mocks that preserve small-scale IGM physics within a Gigaparsec-scale volume. It implements a hybrid pipeline that couples a low-resolution hydrodynamic simulation with a conditional GAN to reconstruct sub-kpc baryonic structure, yielding a ~1~Gpc-diameter volume with P1D accuracy better than and P3D accuracy around relative to high-resolution references. A theoretical EFT framework validates the reconstruction, enabling joint fits to and extraction of bias parameters, with the higher-order EFT terms showing consistency across resolutions while linear biases remain sensitive to large-scale coverage. The work demonstrates a practical route to survey-scale, high-fidelity mock skies, publicly releasing the hydrodynamic volume and companion halo catalog and outlining future hybridizations with perturbation theory to recover the largest-scale modes.

Abstract

The next generation of cosmological spectroscopic sky surveys will probe the distribution of matter across several Gigaparsecs (Gpc) or many billion light-years. In order to leverage the rich data in these new maps to gain a better understanding of the physics that shapes the large-scale structure of the cosmos, observed matter distributions must be compared to simulated mock skies. Small mock skies can be produced using precise, physics-driven hydrodynamical simulations. However, the need to capture small, kpc-scale density fluctuations in the intergalactic medium (IGM) places tight restrictions on the necessary minimum resolution of these simulations. Even on the most powerful supercomputers, it is impossible to run simulations of such high resolution in volumes comparable to what will be probed by future surveys, due to the vast quantity of data needed to store such a simulation in computer memory. However, it is possible to represent the essential features of these high-resolution simulations using orders of magnitude less memory. We present a hybrid approach that employs a physics-driven hydrodynamical simulation at a much lower-than-necessary resolution, followed by a data-driven, deep-learning Enhancement. This hybrid approach allows us to produce hydrodynamic mock skies that accurately capture small, kpc-scale features in the IGM but which span hundreds of Megaparsecs. We have produced such a volume which is roughly one Gigaparsec in diameter and examine its relevant large-scale statistical features, emphasizing certain properties that could not be captured by previous smaller simulations. We present this hydrodynamic volume as well as a companion n-body dark matter simulation and halo catalog which we are making publically available to the community for use in calibrating data pipelines for upcoming survey analyses.

Paper Structure

This paper contains 11 sections, 10 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Visualization of the $960 \,h^{-1}\, {\rm Mpc}\,$ wide baryon density volume we have reconstructed using our machine learning model. The dashed box highlights a region $80 \,h^{-1}\, {\rm Mpc}\,$ wide, which is the size of our training data and comparable to existing simulations used for DESI Walther_2021Walther:2024tcj and other surveys.
  • Figure 2: Illustration of the process for creating training data target fields from high-resolution simulations. Fields are iteratively Gaussian blurred and sub-sampled until they match the grid size (and data footprint) of the low-resolution simulations. This method preserves most fine structures while dramatically reducing the memory necessary to store the density fields on our computer.
  • Figure 3: Illustration of the effects of resolution on Ly-$\alpha$ absorption features. We plot slices of Baryon Density $\sim 30$ Mpc wide for High-Res, Low-Res, and Reconstructed fields, visualized in log-space. Notice that absorption features are spectrally sharpest in the high-resolution simulation but that some spectral detail is recovered by our ML model.
  • Figure 4: (Top) The probability density functions of the predicted density and temperature fields (dark blue lines) compared against the reference distributions from a target high-resolution simulation (dotted lines). (Bottom) The density–temperature phase distribution of the gas in our model prediction compared against the reference distributions from the low-resolution and high-resolution simulations. For each output we plot the best-fit power-law relationship between density and temperature, our model presents an improvement in the slope of this power-law, $\gamma$, when compared against the low-resolution simulation, but produces a much broader distribution.
  • Figure 5: The probability density (PDF, left panel) and 1D power spectrum (P1D, right panel) of Ly-$\alpha$ flux in redshift space for four related simulation volumes: the high and low-resolution Nyx simulations, the resized target, and our network output. The grey band in the residual highlights the 10% error range around the smaller, 80 $\,h^{-1}\, {\rm Mpc}\,$ high-resolution simulation from our validation dataset. The strong P1D agreement demonstrates that small and intermediate-scale fluctuations are well captured; in contrast, discrepancies in the PDF arise because the model’s limited correlation length prevents reconstruction of correct large-scale phases and spatial correlations.
  • ...and 3 more figures