Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation
Pratyush Kumar Das, Romeel Davé, Weiguang Cui
TL;DR
MIG presents an end-to-end machine-learning framework to populate dark-matter halos with galaxies by separating centrals and satellites, classifying SF versus Q systems, and regressing SF-subsets to predict $M_{*}$, SFR, $M_{\mathrm{HI}}$, $M_{\mathrm{H_2}}$, and $Z$, trained on the Simba simulation. The study shows that a fraction-based prediction approach, combined with TPOT AutoML and RF feature selection, yields high accuracy across redshifts $z=0,1,2$, with particularly strong gains for satellite galaxies. MIG also recovers galaxy mass functions more faithfully than direct prediction methods, enabling precise predictions of baryonic tracers for large-volume HI intensity mapping. The framework provides a scalable, physically informed method to generate mock galaxy catalogs and tracers for upcoming surveys, while highlighting the importance of SF/Q separation and feature selection in capturing the complex halo–galaxy connection.
Abstract
We present a machine-learning framework, Machine Inferred Galaxy (MIG), to populate dark-matter haloes with galaxies in N-body simulations. MIG predicts stellar mass ($M_\ast$), star-formation rate (SFR), atomic and molecular gas masses ($M_{\mathrm{HI}}$ and $M_{\mathrm{H_2}}$), and metallicity, and can be extended to other properties and simulations. The pipeline first separates haloes into centrals and satellites, then uses classifiers to distinguish star-forming (SF) from quenched (Q) systems, followed by regressors trained on the SF subsets for both centrals and satellites. Trained on the $(100,h^{-1},\mathrm{Mpc})^3$ SIMBA galaxy-formation simulation at $z=0$, MIG achieves high accuracy for key baryonic properties (e.g. $R^2 \approx 0.9$ for $M_{\mathrm{HI}}$ of central galaxies), and remains robust at $z=1$ and $z=2$. Training on fractional quantities (e.g. $M_{\mathrm{HI}}/M_\ast$) and rescaling by predicted $M_\ast$ improves performance over direct predictions across properties and redshifts. MIG also reproduces galaxy mass distribution functions with higher fidelity, enabling accurate predictions of integrated tracers such as H I intensity maps. MIG therefore provides an efficient, physically consistent route to generate mock galaxy catalogues and baryonic tracers in large cosmological volumes for upcoming surveys.
