Table of Contents
Fetching ...

Graph Path Likelihood for Galaxy Formation on Layered Halo Graphs

Daneng Yang

Abstract

Likelihood-based forward modeling is standard in galaxy formation, but most implementations are formulated as forward maps rather than explicit trajectory-level likelihoods conditioned jointly on assembly history and environment. We introduce a Graph Path Likelihood Model (GPLM) on layered halo graphs, where temporal edges encode causal transport and coeval host edges encode environmental conditioning. On a fixed layered graph, the graph-conditioned path measure is written as $P(\mathbf{x}\mid G)\propto p_{\rm attach}(\mathbf{x}\mid G)\exp[-S(\mathbf{x}; G)]$, where $S$ is an effective action for dynamical increments and $p_{\rm attach}$ is a boundary measure for node entry. We also discuss a minimal preferential attachment-detachment prescription for the graph probability $P(G)$, which facilitates placing the likelihood within a cosmological ensemble of layered graphs. Trained on layered graphs reconstructed from TNG50-1, GPLM improves stellar- and gas-mass predictions over transport-only baselines. As fixed-graph applications, we evaluate dark-matter-deficient-galaxy operator averages, compute gas-channel response under controlled deformations, and compare full and host-ablated path measures through likelihood-ratio diagnostics. In these examples, higher-order satellites show a higher incidence of dark-matter deficiency and broader graph-to-graph variation, while the gas-rich response indicates more diverse environmental processing histories. GPLM thus provides a proof-of-principle likelihood framework in which trajectory likelihood ratios, operator averages, and response diagnostics become explicit statistical observables, with connections to astrophysical forward models, machine-learning emulators, and field-theoretic diagnostics.

Graph Path Likelihood for Galaxy Formation on Layered Halo Graphs

Abstract

Likelihood-based forward modeling is standard in galaxy formation, but most implementations are formulated as forward maps rather than explicit trajectory-level likelihoods conditioned jointly on assembly history and environment. We introduce a Graph Path Likelihood Model (GPLM) on layered halo graphs, where temporal edges encode causal transport and coeval host edges encode environmental conditioning. On a fixed layered graph, the graph-conditioned path measure is written as , where is an effective action for dynamical increments and is a boundary measure for node entry. We also discuss a minimal preferential attachment-detachment prescription for the graph probability , which facilitates placing the likelihood within a cosmological ensemble of layered graphs. Trained on layered graphs reconstructed from TNG50-1, GPLM improves stellar- and gas-mass predictions over transport-only baselines. As fixed-graph applications, we evaluate dark-matter-deficient-galaxy operator averages, compute gas-channel response under controlled deformations, and compare full and host-ablated path measures through likelihood-ratio diagnostics. In these examples, higher-order satellites show a higher incidence of dark-matter deficiency and broader graph-to-graph variation, while the gas-rich response indicates more diverse environmental processing histories. GPLM thus provides a proof-of-principle likelihood framework in which trajectory likelihood ratios, operator averages, and response diagnostics become explicit statistical observables, with connections to astrophysical forward models, machine-learning emulators, and field-theoretic diagnostics.
Paper Structure (15 sections, 57 equations, 14 figures, 1 table)

This paper contains 15 sections, 57 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Layered halo graphs constructed for the three most massive $z{=}0$ hosts (left to right: rank 1, rank 2, rank 3) in the TNG-50-1 simulation. Each panel shows the layered topology obtained by the backward construction. Nodes correspond to halos of mass higher than $M_{\rm cut}=10^9~\rm M_{\odot}/h$ and are illustrated with different colors at different layers. Temporal and host edges are both illustrated in dashed gray. They connect all the nodes across the layers into a single object. The illustrated 13 layers correspond to redshifts $z = 5.0, 4.0, 3.0, 2.0, 1.5, 1.0, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1,$ and $0$, as stacked from the bottom to top. At layers with redshifts greater than zero, rings of nodes surrounding the central graphs correspond to isolated low-mass halos that accrete into their hosts in the next layer. At redshift zero, there is only one halo graph that corresponds to the host halo and its subhalos. These graphs, together with the attributes carried by the nodes and edges, are inputs to the GPLM.
  • Figure 2: Layered halo graph panels at $z{=}0$ for the three most massive hosts (ranks 1--3). Each panel shows the coeval host--subhalo network in a spring layout; node colors encode graph distance $d_G$ to the central host (legend), with $d_G{=}0$ marking the central, $d_G{=}1$ the immediate satellites, and $d_G\ge 2$ outer/higher-order satellites. These bins define the environment categories used throughout the paper.
  • Figure 3: Overview of the GPLM architecture. Successive layer pairs extracted from a layered halo graph feed one shared message-passing backbone during both training and inference. A residual GNN predicts drift corrections and covariances conditioned on the transported state and graph context, and the same learned kernel is then iterated from one layer pair to the next along the full history. Colors indicate temporal edges (blue) and host edges (orange). In each message-passing block we apply $\text{sum}\rightarrow\text{LayerNorm}\rightarrow\text{SiLU}\rightarrow\text{skip-add}$. Here skip-add means a residual update that adds the transformed message $u$ back to the incoming hidden state, $h\leftarrow h+u$.
  • Figure 4: Stacked truth versus prediction parity across all snapshots for the stellar mass (left figure) and gas mass (right figure). In each figure the left panel shows the transport-only results and the right panel shows the GPLM ones. Inset legends report the sample size and RMSE in $\log_{10}(M/\mathrm{M}_{\odot})$. GPLM improves the performance in both the modeled fields.
  • Figure 5: $M_{\rm gas}$ versus $M_{\star}$ scatter plots at $z{=}0$, colored by the host-edge graph distance $d_G$. From left to right: the plots correspond to TNG-50-1 simulation, the transport-only model, and the GPLM scenarios. The $d_G$ classes exhibit different distributions in the plane, and the GPLM case yields better agreement with the simulation results.
  • ...and 9 more figures