Galactification: painting galaxies onto dark matter only simulations using a transformer-based model
Shivam Pandey, Christopher C. Lovell, Chirag Modi, Benjamin D. Wandelt
TL;DR
To interpret cosmological surveys, the paper tackles the computational bottleneck of generating realistic galaxy catalogs by learning a conditional distribution of galaxy properties from fast $N$-body simulations conditioned on cosmological and astrophysical parameters. The authors develop a multi-modal transformer-based model that ingests 3D dark matter density and velocity fields across multiple redshifts and autoregressively outputs a galaxy point cloud with six properties: $(x,y,z,v_x,\,\log M_{\star}, M_g)$, using a CBAM+ViT encoder and a cross-attention decoder with Rotary Position Embeddings. They train on $1000$ CAMELS Illustris-TNG pairs spanning variations in $(\Omega_m, \sigma_8)$ and four sub-grid parameters, achieving accurate one-point statistics and redshift-space power spectra, validated with PQMass and visual checks. The approach yields a ~100x speedup over hydrodynamical simulations, enabling robust Bayesian inference over large cosmological and astrophysical parameter spaces.
Abstract
Connecting the formation and evolution of galaxies to the large-scale structure is crucial for interpreting cosmological observations. While hydrodynamical simulations accurately model the correlated properties of galaxies, they are computationally prohibitive to run over volumes that match modern surveys. We address this by developing a framework to rapidly generate mock galaxy catalogs conditioned on inexpensive dark-matter-only simulations. We present a multi-modal, transformer-based model that takes 3D dark matter density and velocity fields as input, and outputs a corresponding point cloud of galaxies with their physical properties. We demonstrate that our trained model faithfully reproduces a variety of galaxy summary statistics and correctly captures their variation with changes in the underlying cosmological and astrophysical parameters, making it the first accelerated forward model to capture all the relevant galaxy properties, their full spatial distribution, and their conditional dependencies in hydrosimulations.
