Table of Contents
Fetching ...

FLORAH-Tree: Emulating Dark Matter Halo Merger Trees with Graph Generative Models

Tri Nguyen, Chirag Modi, Siddharth Mishra-Sharma, L. Y. Aaron Yung, Rachel S. Somerville

TL;DR

FLORAH-Tree addresses the challenge of generating complete dark matter halo merger trees with environmental information, extending the prior FLORAH model to capture full branching histories. It combines an RNN-based history encoder, a multinomial classifier for the number of progenitors, and a neural density estimator with normalizing flows to autoregressively generate progenitor properties conditioned on history and redshift. The method is trained on VSMDPL N-body merger trees and validated against both the simulation and EPS-based trees, showing excellent reproduction of progenitor mass distributions and merger rates, and yielding galaxy-halo scaling relations in close agreement with the reference simulation when run through the Santa Cruz SAM. FLORAH-Tree provides a fast, scalable alternative to full simulations for structure formation studies and enables environmentally informed tree generation with potential extensions to multi-cosmology conditioning and lightcone applications.

Abstract

Merger trees track the hierarchical assembly of dark matter halos across cosmic time and serve as essential inputs for semi-analytic models of galaxy formation. However, conventional methods for constructing merger trees rely on ad-hoc assumptions and are unable to incorporate environmental information. Nguyen et al. (2024) introduced FLORAH, a generative model based on recurrent neural networks and normalizing flows, for modeling main progenitor branches of merger trees. In this work, we extend this model, now referred to as FLORAH-Tree, to generate complete merger trees by representing them as graph structures that capture the full branching hierarchy. We trained FLORAH-Tree on merger trees extracted from the Very Small MultiDark Planck cosmological N-body simulation. To validate our approach, we compared the generated merger trees with both the original simulation data and with semi-analytic trees produced using the Extended Press-Schechter (EPS) formalism. We show that FLORAH-Tree accurately reproduces key merger rate statistics across a wide range of mass and redshift, outperforming the conventional EPS-based approach. We demonstrate its utility by applying the Santa Cruz semi-analytic model (SAM) to generated trees and showing that the resulting galaxy-halo scaling relations, such as the stellar-to-halo-mass relation and supermassive black hole mass-halo mass relation, closely match those from applying the SAM to trees extracted directly from the simulation. FLORAH-Tree provides a computationally efficient method for generating merger trees that maintain the statistical fidelity of N-body simulations.

FLORAH-Tree: Emulating Dark Matter Halo Merger Trees with Graph Generative Models

TL;DR

FLORAH-Tree addresses the challenge of generating complete dark matter halo merger trees with environmental information, extending the prior FLORAH model to capture full branching histories. It combines an RNN-based history encoder, a multinomial classifier for the number of progenitors, and a neural density estimator with normalizing flows to autoregressively generate progenitor properties conditioned on history and redshift. The method is trained on VSMDPL N-body merger trees and validated against both the simulation and EPS-based trees, showing excellent reproduction of progenitor mass distributions and merger rates, and yielding galaxy-halo scaling relations in close agreement with the reference simulation when run through the Santa Cruz SAM. FLORAH-Tree provides a fast, scalable alternative to full simulations for structure formation studies and enables environmentally informed tree generation with potential extensions to multi-cosmology conditioning and lightcone applications.

Abstract

Merger trees track the hierarchical assembly of dark matter halos across cosmic time and serve as essential inputs for semi-analytic models of galaxy formation. However, conventional methods for constructing merger trees rely on ad-hoc assumptions and are unable to incorporate environmental information. Nguyen et al. (2024) introduced FLORAH, a generative model based on recurrent neural networks and normalizing flows, for modeling main progenitor branches of merger trees. In this work, we extend this model, now referred to as FLORAH-Tree, to generate complete merger trees by representing them as graph structures that capture the full branching hierarchy. We trained FLORAH-Tree on merger trees extracted from the Very Small MultiDark Planck cosmological N-body simulation. To validate our approach, we compared the generated merger trees with both the original simulation data and with semi-analytic trees produced using the Extended Press-Schechter (EPS) formalism. We show that FLORAH-Tree accurately reproduces key merger rate statistics across a wide range of mass and redshift, outperforming the conventional EPS-based approach. We demonstrate its utility by applying the Santa Cruz semi-analytic model (SAM) to generated trees and showing that the resulting galaxy-halo scaling relations, such as the stellar-to-halo-mass relation and supermassive black hole mass-halo mass relation, closely match those from applying the SAM to trees extracted directly from the simulation. FLORAH-Tree provides a computationally efficient method for generating merger trees that maintain the statistical fidelity of N-body simulations.

Paper Structure

This paper contains 21 sections, 12 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: The flowchart of FLORAH-Tree. Panel A: An example merger tree with a descendant halo (blue, thick border) at $z^{(4)}$ and two progenitor halos (yellow). Redshift increases from left to right along the branch. To predict the progenitors of the descendant, FLORAH-Tree considers all halos in its history (blue, thin border) while ignoring halos in other branches (black). Panel B: The forward model, which consists of a history encoder $\mathbf{E}_\mathrm{\phi_h}$, classifier $\mathbf{C}_\mathrm{\phi_c}$, and an NDE $\hat{q}_\mathrm{\phi_f}$. FLORAH-Tree inputs the descendant, its history, and progenitor redshift to output the number of progenitors $N_\text{p}$ with properties $\mathcal{X}_{\text{p}}$. All components are jointly optimized during training. Panel C: NDE training (top) and inference (bottom). During training, the progenitor encoder $\mathbf{E}_\mathrm{\phi_p}$ encodes true progenitor properties as conditioning input for $\hat{q}_\mathrm{\phi_f}$, with $\mathcal{L}_\mathrm{NDE}$ summed over the sequence. During inference, progenitors are generated autoregressively starting with zero token $\vec{\mathbf{0}}$.
  • Figure 2: Example generated merger trees. From left to right, each column shows merger trees with root masses $(10^{13}, 10^{12}, 10^{11}, 10^{10}) \, \mathrm{M_\odot}\xspace$. Nodes in each tree represent DM halos. The node sizes and colors indicate the halo mass (relative within each column). The node vertical positions indicate redshift, increasing from top to bottom, while horizontal positions are arbitrary.
  • Figure 3: Progenitor-descendant mass ratios. Each panel shows a different root mass $M_{\text{r}}$ bin (row) and descendant redshift $z_\text{d}$ bin (column) for ratios $\mu_i \equiv M_{\text{p}, i}\xspace/M_{\text{d}}\xspace$ where $i=1, 2, 3$. The top row shows the distributions, while the bottom shows the residuals, defined as the fractional differences between the VSMDPL and FLORAH-Tree distributions relative to the VSMDPL values. Solid lines and shaded histograms represent FLORAH-Tree and VSMDPL merger trees, respectively. Colors correspond to $i=1$ (blue), $i=2$ (orange), and $i=3$ (green). The close agreement between the shaded and open histograms demonstrates that the FLORAH-Tree method does an excellent job of reproducing these key progenitor-descendant statistics.
  • Figure 4: Progenitor-progenitor mass ratios. The distributions of progenitor mass ratios $\mu_{i,j} \equiv M_{\text{p}, i}\xspace/M_{\text{p}, j}\xspace$ for $i > j$. Panel layout matches Figure \ref{['fig:ratio_mprog_desc']}. Colors correspond to different $(i, j)$ combinations: blue $(2, 1)$, orange $(3, 1)$, and green $(3, 2)$.
  • Figure 5: Comparison between FLORAH-Tree and EPS-tree merger rates. The merger rate $B(M_{\text{d}}\xspace, \xi_i, z_\text{p}\xspace, z_\text{d}\xspace)$ is plotted as a function of the descendant mass, progenitor mass ratio, and redshifts. Each row shows a different progenitor-descendant redshift slice, with each color showing a different descendant mass bin. In each row, the top panels show the merger rate, and the bottom panels show the residuals, $\log_{10} (B_\mathrm{sim} /B_\mathrm{gen})$. Error bars represent the Poisson uncertainties. Left panels compare the FLORAH-Tree and VSMDPL merger rates, while right panels compare the EPS-tree and VSMDPL merger rates. FLORAH-Tree merger trees yield excellent agreement with the merger rates measured from the N-body simulations, while the EPS-based merger trees can over-estimate the merger rate by as much as $\sim 0.5$ dex at high redshift.
  • ...and 5 more figures