Table of Contents
Fetching ...

How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Tri Nguyen, Francisco Villaescusa-Navarro, Siddharth Mishra-Sharma, Carolina Cuesta-Lazaro, Paul Torrey, Arya Farahi, Alex M. Garcia, Jonah C. Rose, Stephanie O'Neil, Mark Vogelsberger, Xuejian Shen, Cian Roche, Daniel Anglés-Alcázar, Nitya Kallivayalil, Julian B. Muñoz, Francis-Yan Cyr-Racine, Sandip Roy, Lina Necib, Kassidy E. Kollmann

TL;DR

This work introduces NeHOD, a hybrid emulator that achieves hydrodynamic-like accuracy for painting galaxies and subhalos onto dark matter halos at a fraction of the computational cost of full hydrodynamic simulations. The framework combines conditional normalizing flows for halos and central galaxies with a Transformer-based variational diffusion model to generate satellite galaxies represented as a 3D point cloud, thereby preserving small-scale structure. Trained on the DREAMS TNG-WDM MW zoom-in suite, NeHOD jointly captures complex dependencies on DM properties and baryonic feedback, reproducing halo/central statistics and satellite statistics such as the SSMF, SHMR, and the concentration–mass relation across a wide parameter space. While NeHOD excels in field-level modeling and parameter exploration, it shows modest underprediction of small-scale clustering and velocity-space details, pointing to future enhancements in environmental conditioning, symmetry incorporation, and larger training sets. Overall, NeHOD offers a scalable, differentiable, and flexible tool for generating realistic mock catalogs for galaxy clustering, lensing, and beyond, with open-source code and broad applicability to DM and baryonic physics studies.

Abstract

The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.

How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

TL;DR

This work introduces NeHOD, a hybrid emulator that achieves hydrodynamic-like accuracy for painting galaxies and subhalos onto dark matter halos at a fraction of the computational cost of full hydrodynamic simulations. The framework combines conditional normalizing flows for halos and central galaxies with a Transformer-based variational diffusion model to generate satellite galaxies represented as a 3D point cloud, thereby preserving small-scale structure. Trained on the DREAMS TNG-WDM MW zoom-in suite, NeHOD jointly captures complex dependencies on DM properties and baryonic feedback, reproducing halo/central statistics and satellite statistics such as the SSMF, SHMR, and the concentration–mass relation across a wide parameter space. While NeHOD excels in field-level modeling and parameter exploration, it shows modest underprediction of small-scale clustering and velocity-space details, pointing to future enhancements in environmental conditioning, symmetry incorporation, and larger training sets. Overall, NeHOD offers a scalable, differentiable, and flexible tool for generating realistic mock catalogs for galaxy clustering, lensing, and beyond, with open-source code and broad applicability to DM and baryonic physics studies.

Abstract

The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.
Paper Structure (37 sections, 21 equations, 15 figures, 2 tables)

This paper contains 37 sections, 21 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Flow chart of the NeHOD framework. Black arrows indicate the flow of information into and out of the models. During inference, simulation parameters (i.e., WDM mass and astrophysical parameters) are first input as conditioning (context) features into the normalizing flows, which then generate a halo and a central galaxy. The properties of the halo and central galaxy, along with the simulation parameters, are subsequently passed as conditioning features into the VDM to generate satellite galaxies. During training, the normalizing flows and the VDM are optimized independently (see Appendix \ref{['app:flows']} and \ref{['app:vdm']} for details on the optimization objectives).
  • Figure 2: Example satellite galaxies generated by NeHOD. The bottom row displays the satellite galaxies of three halos from the DREAMS simulations, with each column corresponding to a different value of $A_\mathrm{SN1}$. The top row shows corresponding realizations of satellite galaxies for the same halos, as generated by NeHOD. The marker size scales logarithmically with the stellar mass of each satellite. For visual clarity, central galaxies, located at the center of the shaded sphere, are omitted. In each panel, the box size and the diameter of the shaded sphere are set to $600 \, \mathrm{kpc} \, h^{-1}$.
  • Figure 3: The properties of halos and central galaxies as a function of the simulation parameters $\{m_\mathrm{WDM}\xspace, A_\mathrm{SN1}\xspace, A_\mathrm{SN2}\xspace, A_\mathrm{AGN}\xspace\}$ (left to right). The top and bottom panels show distributions of the number of satellites $N_\mathrm{sat}$ and the stellar mass of the central galaxies $M_\mathrm{cent, \star}$. The median (solid lines), 16th, and 84th percentiles (dashed lines) of each distribution are shown. The black lines denote the samples generated by conditional flows in this work, while the blue lines and shaded regions denote the simulations.
  • Figure 4: Top: distributions of the stellar masses of the halos $M_\mathrm{\star}$ and central galaxies $M_\mathrm{cent, \star}$. Bottom: distributions of the central galaxy stellar mass $M_\mathrm{cent, \star}$ and the concentration proxy $\widetilde{V}_\mathrm{max}$. In both cases, the 68% and 95% contours are shown for 3 bins of $A_\mathrm{SN1}$ (left) and $A_\mathrm{SN2}$ (right), with each bin denoted by a different color. The solid and dashed lines denote the NeHOD samples and the simulations, respectively.
  • Figure 5: Top: The satellite stellar mass functions (SSMFs) generated by NeHOD and extracted from the simulations. The columns show the variations of the SSMFs over the WDM mass $m_\mathrm{WDM}$ and astrophysical parameters $\{A_\mathrm{SN1}\xspace, A_\mathrm{SN2}\xspace, A_\mathrm{AGN}\xspace\}$. In each column, the color denotes the SSMFs of a bin of the corresponding parameter. The average SSMFs, along with their standard deviations, are shown as solid lines and shaded regions for NeHOD and error bars for the simulations. Middle: Fractional residuals of the average SSMFs, calculated as the difference between the NeHOD SSMFs and the simulation SSMFs divided by the simulation SSMFs. The error bars represent the propagated errors of the fractional residuals, derived from the bootstrapped errors of the average SSMFs. Bottom: Fractional residuals of the standard deviation SSMFs, along with their errors, calculated using the same procedure as for the average SSMFs.
  • ...and 10 more figures