Machine Learning Classification of COSMOS2020 Galaxies: Quiescent vs. Star-Forming

Vahid Asadi; Nima Chartab; Akram Hasani Zonoozi; Hosein Haghi; Ghassem Gozaliasl; Aryana Haghjoo; Bahram Mobasher

Machine Learning Classification of COSMOS2020 Galaxies: Quiescent vs. Star-Forming

Vahid Asadi, Nima Chartab, Akram Hasani Zonoozi, Hosein Haghi, Ghassem Gozaliasl, Aryana Haghjoo, Bahram Mobasher

TL;DR

This work tackles the challenge of reliably separating quiescent from star-forming galaxies in large surveys by leveraging machine learning trained on realistic mock photometry from the Santa Cruz semi-analytic model. A CatBoostClassifier is trained on 28 color features derived from eight mutual bands between the SAM mocks and the COSMOS2020 data, achieving a quiescent F1-score of 0.888 and AUC of 0.97, while vastly surpassing traditional SED-fitting in both accuracy (notably recall) and speed. When applied to the COSMOS2020 catalog, the ML approach yields a higher inferred quiescent fraction across 0.2 < z < 3.5 and provides a scalable path for large surveys. The study highlights the practical potential of ML methods to improve galaxy population studies, with publicly available trained models and classifications to enable community use.

Abstract

Accurately distinguishing between quiescent and star-forming galaxies is essential for understanding galaxy evolution. Traditional methods, such as spectral energy distribution (SED) fitting, can be computationally expensive and may struggle to capture complex galaxy properties. This study aims to develop a robust and efficient machine learning (ML) classification method to identify quiescent and star-forming galaxies within the Farmer COSMOS2020 catalog. We utilized JWST wide-field light cones from the Santa Cruz semi-analytical modeling framework to train a supervised ML model, the CatBoostClassifier, using 28 color features derived from 8 mutual photometric bands within the COSMOS catalog. The model was validated against a testing set and compared to the SED-fitting method in terms of precision, recall, F1-score, and execution time. Preprocessing steps included addressing missing data, injecting observational noise, and applying a magnitude cut (ch1 < 26 AB) along with a redshift range of 0.2 < z < 3.5 to align the simulated and observational datasets. The ML method achieved an F1-score of 89\% for quiescent galaxies, significantly outperforming the SED-fitting method, which achieved 54%. The ML model demonstrated superior recall (88% vs. 38%) while maintaining comparable precision. When applied to the COSMOS2020 catalog, the ML model predicted a systematically higher fraction of quiescent galaxies across all redshift bins within 0.2 < z < 3.5 compared to traditional methods like NUVrJ and SED-fitting. This study shows that ML, combined with multi-wavelength data, can effectively identify quiescent and star-forming galaxies, providing valuable insights into galaxy evolution. The trained classifier and full classification catalog are publicly available.

Machine Learning Classification of COSMOS2020 Galaxies: Quiescent vs. Star-Forming

TL;DR

Abstract

Machine Learning Classification of COSMOS2020 Galaxies: Quiescent vs. Star-Forming

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)