Table of Contents
Fetching ...

A Demonstration of a Neural Network as a Bridge Between Galaxy Simulations and Surveys

E. Elson

TL;DR

The paper tackles the challenge of robust stellar mass estimation by testing a minimal, single-hidden-layer ANN trained on Shark simulations to predict $M_igstar$ from broadband magnitudes and colours, and then applying it to real GAMA data. Despite training only on simulations and using 24 broadband features, the network reproduces SED-derived masses across ~$3.5$ dex with a small offset (~$0.1$ dex) and scatter (~$0.135$ dex); after a residual-bias correction, the predictions align closely with the one-to-one relation. The method is then applied to 17,006 GAMA galaxies lacking SED masses, propagating photometric uncertainties to yield a typical total uncertainty of ~ $0.18$ dex, with a physically motivated relation to $W1$ magnitudes. Overall, this work demonstrates that simulation-trained, lightweight machine-learning models can capture the dominant photometric information needed for stellar mass inference, enabling efficient simulation-to-observation transfer learning for large galaxy surveys.

Abstract

This paper demonstrates that the stellar masses of galaxies in the Galaxy and Mass Assembly (GAMA) survey, originally derived via stellar population synthesis modelling, can be accurately predicted using only their absolute magnitudes and colour indices. A central contribution of this work is the demonstration that this long-standing inference problem can be solved using an exceptionally simple machine-learning model: a fully connected, feed-forward artificial neural network with a single hidden layer. The network is trained exclusively on synthetic galaxies generated by the SHARK semi-analytic model and is shown to transfer effectively to real observations. Across nearly 3.5 dex in stellar mass, the predicted values closely track the GAMA SED-derived masses, with a typical scatter of ~0.131 dex. These results demonstrate that complex deep-learning architectures are not a prerequisite for robust stellar mass estimation, and that simulation-trained, lightweight machine-learning models can capture the dominant physical information encoded in broad-band photometry. The method is further applied to 17,006 GAMA galaxies lacking SED-derived masses, with photometric uncertainties propagated through the network to provide corresponding error estimates on the inferred stellar masses. Overall, this work establishes a computationally efficient and conceptually transparent pathway for simulation-to-observation transfer learning in galaxy evolution studies.

A Demonstration of a Neural Network as a Bridge Between Galaxy Simulations and Surveys

TL;DR

The paper tackles the challenge of robust stellar mass estimation by testing a minimal, single-hidden-layer ANN trained on Shark simulations to predict from broadband magnitudes and colours, and then applying it to real GAMA data. Despite training only on simulations and using 24 broadband features, the network reproduces SED-derived masses across ~ dex with a small offset (~ dex) and scatter (~ dex); after a residual-bias correction, the predictions align closely with the one-to-one relation. The method is then applied to 17,006 GAMA galaxies lacking SED masses, propagating photometric uncertainties to yield a typical total uncertainty of ~ dex, with a physically motivated relation to magnitudes. Overall, this work demonstrates that simulation-trained, lightweight machine-learning models can capture the dominant photometric information needed for stellar mass inference, enabling efficient simulation-to-observation transfer learning for large galaxy surveys.

Abstract

This paper demonstrates that the stellar masses of galaxies in the Galaxy and Mass Assembly (GAMA) survey, originally derived via stellar population synthesis modelling, can be accurately predicted using only their absolute magnitudes and colour indices. A central contribution of this work is the demonstration that this long-standing inference problem can be solved using an exceptionally simple machine-learning model: a fully connected, feed-forward artificial neural network with a single hidden layer. The network is trained exclusively on synthetic galaxies generated by the SHARK semi-analytic model and is shown to transfer effectively to real observations. Across nearly 3.5 dex in stellar mass, the predicted values closely track the GAMA SED-derived masses, with a typical scatter of ~0.131 dex. These results demonstrate that complex deep-learning architectures are not a prerequisite for robust stellar mass estimation, and that simulation-trained, lightweight machine-learning models can capture the dominant physical information encoded in broad-band photometry. The method is further applied to 17,006 GAMA galaxies lacking SED-derived masses, with photometric uncertainties propagated through the network to provide corresponding error estimates on the inferred stellar masses. Overall, this work establishes a computationally efficient and conceptually transparent pathway for simulation-to-observation transfer learning in galaxy evolution studies.
Paper Structure (5 sections, 3 figures)

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: Distributions of the 24 broadband absolute magnitudes and colour indices (all in the AB system) used as input features for the GAMA galaxies in this study. In each panel, the red dashed vertical lines indicate the minimum and maximum values spanned by the corresponding features in the Shark training sample. GAMA galaxies lying outside these ranges were excluded to ensure that all inputs fall within the domain on which the ANN was trained.
  • Figure 2: Top: Raw ANN-predicted stellar masses compared with SED-derived masses for 71 171 GAMA galaxies. The solid line denotes the one-to-one relation, while the red curve shows a polynomial fit to the data. Middle: Corrected ANN predictions after subtracting the fitted residual trend, removing the median offset and bringing the relation into alignment with the one-to-one line. Bottom: Distribution of corrected residuals, with half of the 16--84 percentile range (indicated by the blue-dashed lines) being 0.135 dex. Together, the panels demonstrate that the ANN closely reproduces SED-derived stellar masses with a small correctable bias and low overall scatter.
  • Figure 3: Top: WISE $W1$ magnitude vs. ANN-predicted stellar mass for GAMA galaxies without SED estimates, showing a clear linear relation consistent with $W1$ tracing stellar mass. Bottom: Propagated uncertainties on $\log_{10}(M_\star)$ as a function of predicted mass, typically $\pm0.05$ dex and largest at intermediate masses, where galaxy populations are most diverse in dust content and star-formation history.