Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation

Pratyush Kumar Das; Romeel Davé; Weiguang Cui

Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation

Pratyush Kumar Das, Romeel Davé, Weiguang Cui

TL;DR

MIG presents an end-to-end machine-learning framework to populate dark-matter halos with galaxies by separating centrals and satellites, classifying SF versus Q systems, and regressing SF-subsets to predict $M_{*}$, SFR, $M_{\mathrm{HI}}$, $M_{\mathrm{H_2}}$, and $Z$, trained on the Simba simulation. The study shows that a fraction-based prediction approach, combined with TPOT AutoML and RF feature selection, yields high accuracy across redshifts $z=0,1,2$, with particularly strong gains for satellite galaxies. MIG also recovers galaxy mass functions more faithfully than direct prediction methods, enabling precise predictions of baryonic tracers for large-volume HI intensity mapping. The framework provides a scalable, physically informed method to generate mock galaxy catalogs and tracers for upcoming surveys, while highlighting the importance of SF/Q separation and feature selection in capturing the complex halo–galaxy connection.

Abstract

We present a machine-learning framework, Machine Inferred Galaxy (MIG), to populate dark-matter haloes with galaxies in N-body simulations. MIG predicts stellar mass ($M_\ast$), star-formation rate (SFR), atomic and molecular gas masses ($M_{\mathrm{HI}}$ and $M_{\mathrm{H_2}}$), and metallicity, and can be extended to other properties and simulations. The pipeline first separates haloes into centrals and satellites, then uses classifiers to distinguish star-forming (SF) from quenched (Q) systems, followed by regressors trained on the SF subsets for both centrals and satellites. Trained on the $(100,h^{-1},\mathrm{Mpc})^3$ SIMBA galaxy-formation simulation at $z=0$, MIG achieves high accuracy for key baryonic properties (e.g. $R^2 \approx 0.9$ for $M_{\mathrm{HI}}$ of central galaxies), and remains robust at $z=1$ and $z=2$. Training on fractional quantities (e.g. $M_{\mathrm{HI}}/M_\ast$) and rescaling by predicted $M_\ast$ improves performance over direct predictions across properties and redshifts. MIG also reproduces galaxy mass distribution functions with higher fidelity, enabling accurate predictions of integrated tracers such as H I intensity maps. MIG therefore provides an efficient, physically consistent route to generate mock galaxy catalogues and baryonic tracers in large cosmological volumes for upcoming surveys.

Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation

TL;DR

, SFR,

, and

, trained on the Simba simulation. The study shows that a fraction-based prediction approach, combined with TPOT AutoML and RF feature selection, yields high accuracy across redshifts

, with particularly strong gains for satellite galaxies. MIG also recovers galaxy mass functions more faithfully than direct prediction methods, enabling precise predictions of baryonic tracers for large-volume HI intensity mapping. The framework provides a scalable, physically informed method to generate mock galaxy catalogs and tracers for upcoming surveys, while highlighting the importance of SF/Q separation and feature selection in capturing the complex halo–galaxy connection.

Abstract

We present a machine-learning framework, Machine Inferred Galaxy (MIG), to populate dark-matter haloes with galaxies in N-body simulations. MIG predicts stellar mass (

), star-formation rate (SFR), atomic and molecular gas masses (

and

), and metallicity, and can be extended to other properties and simulations. The pipeline first separates haloes into centrals and satellites, then uses classifiers to distinguish star-forming (SF) from quenched (Q) systems, followed by regressors trained on the SF subsets for both centrals and satellites. Trained on the

SIMBA galaxy-formation simulation at

, MIG achieves high accuracy for key baryonic properties (e.g.

for

of central galaxies), and remains robust at

and

. Training on fractional quantities (e.g.

) and rescaling by predicted

improves performance over direct predictions across properties and redshifts. MIG also reproduces galaxy mass distribution functions with higher fidelity, enabling accurate predictions of integrated tracers such as H I intensity maps. MIG therefore provides an efficient, physically consistent route to generate mock galaxy catalogues and baryonic tracers in large cosmological volumes for upcoming surveys.

Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation

TL;DR

Abstract

Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)