Table of Contents
Fetching ...

Galactification: painting galaxies onto dark matter only simulations using a transformer-based model

Shivam Pandey, Christopher C. Lovell, Chirag Modi, Benjamin D. Wandelt

TL;DR

To interpret cosmological surveys, the paper tackles the computational bottleneck of generating realistic galaxy catalogs by learning a conditional distribution of galaxy properties from fast $N$-body simulations conditioned on cosmological and astrophysical parameters. The authors develop a multi-modal transformer-based model that ingests 3D dark matter density and velocity fields across multiple redshifts and autoregressively outputs a galaxy point cloud with six properties: $(x,y,z,v_x,\,\log M_{\star}, M_g)$, using a CBAM+ViT encoder and a cross-attention decoder with Rotary Position Embeddings. They train on $1000$ CAMELS Illustris-TNG pairs spanning variations in $(\Omega_m, \sigma_8)$ and four sub-grid parameters, achieving accurate one-point statistics and redshift-space power spectra, validated with PQMass and visual checks. The approach yields a ~100x speedup over hydrodynamical simulations, enabling robust Bayesian inference over large cosmological and astrophysical parameter spaces.

Abstract

Connecting the formation and evolution of galaxies to the large-scale structure is crucial for interpreting cosmological observations. While hydrodynamical simulations accurately model the correlated properties of galaxies, they are computationally prohibitive to run over volumes that match modern surveys. We address this by developing a framework to rapidly generate mock galaxy catalogs conditioned on inexpensive dark-matter-only simulations. We present a multi-modal, transformer-based model that takes 3D dark matter density and velocity fields as input, and outputs a corresponding point cloud of galaxies with their physical properties. We demonstrate that our trained model faithfully reproduces a variety of galaxy summary statistics and correctly captures their variation with changes in the underlying cosmological and astrophysical parameters, making it the first accelerated forward model to capture all the relevant galaxy properties, their full spatial distribution, and their conditional dependencies in hydrosimulations.

Galactification: painting galaxies onto dark matter only simulations using a transformer-based model

TL;DR

To interpret cosmological surveys, the paper tackles the computational bottleneck of generating realistic galaxy catalogs by learning a conditional distribution of galaxy properties from fast -body simulations conditioned on cosmological and astrophysical parameters. The authors develop a multi-modal transformer-based model that ingests 3D dark matter density and velocity fields across multiple redshifts and autoregressively outputs a galaxy point cloud with six properties: , using a CBAM+ViT encoder and a cross-attention decoder with Rotary Position Embeddings. They train on CAMELS Illustris-TNG pairs spanning variations in and four sub-grid parameters, achieving accurate one-point statistics and redshift-space power spectra, validated with PQMass and visual checks. The approach yields a ~100x speedup over hydrodynamical simulations, enabling robust Bayesian inference over large cosmological and astrophysical parameter spaces.

Abstract

Connecting the formation and evolution of galaxies to the large-scale structure is crucial for interpreting cosmological observations. While hydrodynamical simulations accurately model the correlated properties of galaxies, they are computationally prohibitive to run over volumes that match modern surveys. We address this by developing a framework to rapidly generate mock galaxy catalogs conditioned on inexpensive dark-matter-only simulations. We present a multi-modal, transformer-based model that takes 3D dark matter density and velocity fields as input, and outputs a corresponding point cloud of galaxies with their physical properties. We demonstrate that our trained model faithfully reproduces a variety of galaxy summary statistics and correctly captures their variation with changes in the underlying cosmological and astrophysical parameters, making it the first accelerated forward model to capture all the relevant galaxy properties, their full spatial distribution, and their conditional dependencies in hydrosimulations.

Paper Structure

This paper contains 6 sections, 4 figures.

Figures (4)

  • Figure 1: Model architecture. Left: Input dark matter density field. Right: Target galaxy distribution. An encoder (CBAM + Vision Transformer) extracts features that condition a cross-attention decoder to generate a tokenized sequence of galaxy properties. See Sec. \ref{['sec:data_methods']} and Pandey:2024:arXiv:gotham for details.
  • Figure 2: Comparison of multi-dimensional data distribution. We each galaxy as a six dimensional vector (3 position tokens + 3 property tokens) in all the test simulations and compare the distribution of the mock and truth data using the PQMass methodology outlined in Lemos:2024:arXiv:. We find that the histogram of difference between the two catalogs agrees with the red line which corresponds to the expected $\chi^2$ curve if the mock are truth come from the same underlying distribution.
  • Figure 3: Comparison of one- and two-point summary statistics. The top row compares one-point distributions (histograms) and the bottom row compares two-point statistics. In all panels, 16th-84th percentile regions from mock catalogs sampled from our model (filled regions) are compared against the hydrodynamical simulations (truth; solid lines, squares) colored by their corresponding value of cosmological parameter $\Omega_{\rm m}$. Top panels: Distributions of stellar mass, g-band magnitude and line-of-sight velocity (left to right) of galaxies. Lines are colored by a cosmological parameter, showing the model captures these physical dependencies. Bottom panels: Redshift space power spectra, either unweighted (left), or weighted by g-band magnitude (middle) and stellar mass (right).
  • Figure 4: Visual comparison of true and mock galaxy distributions. The figure displays results for three different test simulations, each with a unique set of cosmological and astrophysical parameters (one per row). The left column shows the input dark matter density field that is one of the input fields fed to the model. The middle column shows the true galaxy distribution from the hydrodynamical simulation, while the right column shows the corresponding distribution generated by our model. In the middle and right columns, galaxies are colored by their stellar mass.