A Gigaparsec-Scale Hydrodynamic Volume Reconstructed with Deep Learning
Cooper Jacobus, Roger de Belsunce, Solene Chabanier, Peter Harrington, JD Emberson, Zarija Lukić, Salman Habib
TL;DR
The paper tackles the challenge of generating DESI-scale Ly-$\alpha$ forest mocks that preserve small-scale IGM physics within a Gigaparsec-scale volume. It implements a hybrid pipeline that couples a low-resolution hydrodynamic simulation with a conditional GAN to reconstruct sub-kpc baryonic structure, yielding a ~1~Gpc-diameter volume with P1D accuracy better than $\sim10\%$ and P3D accuracy around $\sim20\%$ relative to high-resolution references. A theoretical EFT framework validates the reconstruction, enabling joint fits to $P(k,\mu)$ and extraction of bias parameters, with the higher-order EFT terms showing consistency across resolutions while linear biases remain sensitive to large-scale coverage. The work demonstrates a practical route to survey-scale, high-fidelity mock skies, publicly releasing the hydrodynamic volume and companion halo catalog and outlining future hybridizations with perturbation theory to recover the largest-scale modes.
Abstract
The next generation of cosmological spectroscopic sky surveys will probe the distribution of matter across several Gigaparsecs (Gpc) or many billion light-years. In order to leverage the rich data in these new maps to gain a better understanding of the physics that shapes the large-scale structure of the cosmos, observed matter distributions must be compared to simulated mock skies. Small mock skies can be produced using precise, physics-driven hydrodynamical simulations. However, the need to capture small, kpc-scale density fluctuations in the intergalactic medium (IGM) places tight restrictions on the necessary minimum resolution of these simulations. Even on the most powerful supercomputers, it is impossible to run simulations of such high resolution in volumes comparable to what will be probed by future surveys, due to the vast quantity of data needed to store such a simulation in computer memory. However, it is possible to represent the essential features of these high-resolution simulations using orders of magnitude less memory. We present a hybrid approach that employs a physics-driven hydrodynamical simulation at a much lower-than-necessary resolution, followed by a data-driven, deep-learning Enhancement. This hybrid approach allows us to produce hydrodynamic mock skies that accurately capture small, kpc-scale features in the IGM but which span hundreds of Megaparsecs. We have produced such a volume which is roughly one Gigaparsec in diameter and examine its relevant large-scale statistical features, emphasizing certain properties that could not be captured by previous smaller simulations. We present this hydrodynamic volume as well as a companion n-body dark matter simulation and halo catalog which we are making publically available to the community for use in calibrating data pipelines for upcoming survey analyses.
