A Demonstration of a Neural Network as a Bridge Between Galaxy Simulations and Surveys
E. Elson
TL;DR
The paper tackles the challenge of robust stellar mass estimation by testing a minimal, single-hidden-layer ANN trained on Shark simulations to predict $M_igstar$ from broadband magnitudes and colours, and then applying it to real GAMA data. Despite training only on simulations and using 24 broadband features, the network reproduces SED-derived masses across ~$3.5$ dex with a small offset (~$0.1$ dex) and scatter (~$0.135$ dex); after a residual-bias correction, the predictions align closely with the one-to-one relation. The method is then applied to 17,006 GAMA galaxies lacking SED masses, propagating photometric uncertainties to yield a typical total uncertainty of ~ $0.18$ dex, with a physically motivated relation to $W1$ magnitudes. Overall, this work demonstrates that simulation-trained, lightweight machine-learning models can capture the dominant photometric information needed for stellar mass inference, enabling efficient simulation-to-observation transfer learning for large galaxy surveys.
Abstract
This paper demonstrates that the stellar masses of galaxies in the Galaxy and Mass Assembly (GAMA) survey, originally derived via stellar population synthesis modelling, can be accurately predicted using only their absolute magnitudes and colour indices. A central contribution of this work is the demonstration that this long-standing inference problem can be solved using an exceptionally simple machine-learning model: a fully connected, feed-forward artificial neural network with a single hidden layer. The network is trained exclusively on synthetic galaxies generated by the SHARK semi-analytic model and is shown to transfer effectively to real observations. Across nearly 3.5 dex in stellar mass, the predicted values closely track the GAMA SED-derived masses, with a typical scatter of ~0.131 dex. These results demonstrate that complex deep-learning architectures are not a prerequisite for robust stellar mass estimation, and that simulation-trained, lightweight machine-learning models can capture the dominant physical information encoded in broad-band photometry. The method is further applied to 17,006 GAMA galaxies lacking SED-derived masses, with photometric uncertainties propagated through the network to provide corresponding error estimates on the inferred stellar masses. Overall, this work establishes a computationally efficient and conceptually transparent pathway for simulation-to-observation transfer learning in galaxy evolution studies.
