Large, fast and accurate HI intensity maps with latent overlap diffusion
Satvik Mishra, Roberto Trotta, Matteo Viel
TL;DR
The paper tackles the computational cost of predicting the 21 cm HI signal by learning from hydrodynamical simulations and generating large-volume HI maps from dark matter-only runs. It introduces HALOgen, a 3D U‑Net with attention that maps DM density to four halo-mass channels, and LODI, a conditional variational diffusion model that paints the 21 cm brightness temperature using latent overlap to stitch subvolumes. The latent overlap scheme enables seamless assembly of 32^3 sub-volumes into a $25^3$ volume, reducing boundary discontinuities relative to naive tiling. On the CAMELS/IllustrisTNG CV data, the pipeline reproduces the dimensionless power spectrum within $\le 10\%$ for $k \le 10\,h\,\mathrm{Mpc}^{-1}$ and requires only about 2 minutes of compute per field, demonstrating scalability to arbitrary volumes. This approach supports fast HI mock generation for cross-correlation studies and cosmological parameter inference in next-generation intensity mapping surveys.
Abstract
The distribution of 21 cm emission from neutral hydrogen is a powerful cosmological and astrophysical probe, as it traces the underlying dark matter and cold gas distributions throughout cosmic times. However, the prediction of observable signals is hindered by the large computational costs of the required hydrodynamic simulations. We introduce a novel machine learning pipeline that, once trained on a hydrodynamical simulation, is able to generate both halo mass density maps and the three-dimensional 21 cm brightness temperature signal, starting from a dark matter-only simulation. We use an attention-based ResUNet (HALO) to predict dark matter halo maps, which are then processed through a trained conditional variational diffusion model (LODI) to produce 21 cm brightness temperature maps. LODI is trained on smaller sub-volumes that are then seamlessly combined in 512-times larger volume using a new method, called `latent overlap'. We demonstrate that, once trained on 25^3 (Mpc/h)^3 volume simulations, we are able to predict the 21 cm power spectrum on an unseen dark matter map (with the same cosmology) to within 10% for wavenumbers k <= 10 h Mpc^-1, deep inside the non-linear regime, with a computational effort of the order of two minutes. While demonstrated on this specific volume, our approach is designed to be scalable to arbitrarily large simulations.
