Table of Contents
Fetching ...

Total and dark mass from observations of galaxy centers with Machine Learning

Sirui Wu, Nicola R. Napolitano, Crescenzo Tortora, Rodrigo von Marttens, Luciano Casarini, Rui Li, Weipeng Lin

TL;DR

This study tackles the challenge of estimating a galaxy’s central mass, including the dark matter component inside the effective radius, using only readily available imaging and spectroscopy. It introduces Mela, a Random Forest–based estimator trained on the IllustrisTNG100 hydrodynamical simulations, mapping simple observables to central masses $M_{ m tot}(r_{1/2})$ and $M_{ m DM}(r_{1/2})$. Real-data tests on SPIDER, MaNGA DynPop, and SAMI Fornax Dwarf demonstrate predictions consistent with Jeans-based dynamical masses, within approximately 0.3 dex and with limited outliers, across ETGs, LTGs, and dwarfs. The approach is robust to the kinematic data quality and dynamical modeling choices, offering a scalable, observationally driven alternative for mass inference in upcoming stage-IV surveys, while enabling exploration of cosmology and baryonic physics via simulations and CAMELS-like extensions.

Abstract

The galaxy total mass inside the effective radius encode important information on the dark matter and galaxy evolution model. Total "central" masses can be inferred via galaxy dynamics or with gravitational lensing, but these methods have limitations. We propose a novel approach, based on Random Forest, to make predictions on the total and dark matter content of galaxies using simple observables from imaging and spectroscopic surveys. We use catalogs of multi-band photometry, sizes, stellar mass, kinematic "measurements" (features) and dark matter (targets) of simulated galaxies, from Illustris-TNG100 hydrodynamical simulation, to train a Mass Estimate machine Learning Algorithm (Mela). We separate the simulated sample in passive early-type galaxies (ETGs), both "normal" and "dwarf", and active late-type galaxies (LTGs) and show that the mass estimator can accurately predict the galaxy dark masses inside the effective radius in all samples. We finally test the mass estimator against the central mass estimates of a series of low redshift (z$\leq$0.1) datasets, including SPIDER, MaNGA/DynPop and SAMI dwarf galaxies, derived with standard dynamical methods based on Jeans equations. Dynamical masses are reproduced within 0.30 dex ($\sim2σ$), with a limited fraction of outliers and almost no bias. This is independent of the sophistication of the kinematical data collected (fiber vs. 3D spectroscopy) and the dynamical analysis adopted (radial vs. axisymmetric Jeans equations, virial theorem). This makes Mela a powerful alternative to predict the mass of galaxies of massive stage-IV surveys' datasets.

Total and dark mass from observations of galaxy centers with Machine Learning

TL;DR

This study tackles the challenge of estimating a galaxy’s central mass, including the dark matter component inside the effective radius, using only readily available imaging and spectroscopy. It introduces Mela, a Random Forest–based estimator trained on the IllustrisTNG100 hydrodynamical simulations, mapping simple observables to central masses and . Real-data tests on SPIDER, MaNGA DynPop, and SAMI Fornax Dwarf demonstrate predictions consistent with Jeans-based dynamical masses, within approximately 0.3 dex and with limited outliers, across ETGs, LTGs, and dwarfs. The approach is robust to the kinematic data quality and dynamical modeling choices, offering a scalable, observationally driven alternative for mass inference in upcoming stage-IV surveys, while enabling exploration of cosmology and baryonic physics via simulations and CAMELS-like extensions.

Abstract

The galaxy total mass inside the effective radius encode important information on the dark matter and galaxy evolution model. Total "central" masses can be inferred via galaxy dynamics or with gravitational lensing, but these methods have limitations. We propose a novel approach, based on Random Forest, to make predictions on the total and dark matter content of galaxies using simple observables from imaging and spectroscopic surveys. We use catalogs of multi-band photometry, sizes, stellar mass, kinematic "measurements" (features) and dark matter (targets) of simulated galaxies, from Illustris-TNG100 hydrodynamical simulation, to train a Mass Estimate machine Learning Algorithm (Mela). We separate the simulated sample in passive early-type galaxies (ETGs), both "normal" and "dwarf", and active late-type galaxies (LTGs) and show that the mass estimator can accurately predict the galaxy dark masses inside the effective radius in all samples. We finally test the mass estimator against the central mass estimates of a series of low redshift (z0.1) datasets, including SPIDER, MaNGA/DynPop and SAMI dwarf galaxies, derived with standard dynamical methods based on Jeans equations. Dynamical masses are reproduced within 0.30 dex (), with a limited fraction of outliers and almost no bias. This is independent of the sophistication of the kinematical data collected (fiber vs. 3D spectroscopy) and the dynamical analysis adopted (radial vs. axisymmetric Jeans equations, virial theorem). This makes Mela a powerful alternative to predict the mass of galaxies of massive stage-IV surveys' datasets.
Paper Structure (35 sections, 3 equations, 21 figures, 6 tables)

This paper contains 35 sections, 3 equations, 21 figures, 6 tables.

Figures (21)

  • Figure 1: Distribution of relevant features and targets as in Table \ref{['tab: features and targets']}: total mass inside the stellar half-mass radius, augmented dark matter mass inside stellar half-mass radius, half-mass radius, stellar mass in half-mass radius, velocity dispersion, total and dark matter mass in half-light radius. Left: Galaxies are divided into ETGs and LTGs on the basis of their SFR. Right: ETGs are further divided into normal and dwarf ETGs based on the classification criteria outlined in Table \ref{['tab:classes']}. The normalized distribution of the features and targets is shown along the diagonal. Units are as in Table \ref{['tab: features and targets']}. This is the original data from TNG100 without considering mock measurement errors. To get a comparative picture, a fixed value was set for the different types of galaxies. We randomly get a 20,000 galaxy subsample from the full dataset and from the three galaxy types.
  • Figure 2: Correlation heat map of the different TNG galaxy samples defined in §\ref{['sec:TNG_sim']} when not considering (upper row) and considering (bottom row) the mock measurement errors, as in §\ref{['sec:meas_err']}. The correlation coefficients are calculated using the Pearson correlation coefficient (see Eq. \ref{['eq:Paerson1']}).
  • Figure 3: Kernel density estimation (KDE) for each class of the dataset. Top row: KDE of the nETG dataset. Bottom row: KDE of the LTG and dETG dataset. The number of each class of datasets is indicated in Table \ref{['tab:classes']}. All the data points are within the x-axis limit. In the case of Fornax, an incompleteness of the smoothed estimate is evident due to the limited number of data points.
  • Figure 4: Self-prediction test using full features as indicated in Table \ref{['tab: features and targets']}, with the full-counts training sample incorporating added measurement errors, as described in §\ref{['sec:meas_err']}. Top row: Target is $M_{\rm DM}(\hbox{$r_{\rm 1/2}$})$. Bottom row: Target is $M_{\rm tot}(\hbox{$r_{\rm 1/2}$})$ . The results without measurement errors are presented in Appendix \ref{['app:errors']}. The data is divided into 80% for training and 20% for testing. The x-axis represents the true values, while the y-axis represents the predicted values. "numofgal" is the number of the test set. The purple error bar represents the 16%, 50%, 84% percentiles as a function of $M^{\rm true}(r_{\rm 1/2})$, with a bin size of 0.2 dex. The red dashed line is $\pm$ 0.30 dex (corresponding to $\sim2\sigma$ errors, see text). Outliers are defined as the fraction of data outside the red dashed line. In the case of accurate predictions, the data points are expected to lie along the dotted 1-to-1 line.
  • Figure 5: Self-prediction test performed using the full set of features and a balanced-counts training sample, which includes measurement errors. The training-test sample has been adjusted to maintain an equal number of entries across all samples in Fig. \ref{['fig: self-prediction-simu']} through random selection, aligning with the less populated class (nETGs). The training set consists of 80% of the randomly selected subsample (16,800 entries), while the remaining 20% (4,200 entries) is allocated for testing. Top row: Target is $M_{\rm DM}(r_{\rm 1/2})$. Bottom row: Target is $M_{\rm tot}(\hbox{$r_{\rm 1/2}$})$ .
  • ...and 16 more figures