The contribution of the color space in LSST-like photometry for the selection of extragalactic globular cluster candidates
Nicholas Schweder-Souza, Ana L. Chies-Santos, Rafael S. de Souza, Kristen C. Dage, Charles J. Bonatto, Juan P. Caso, Michele Cantiello, Pedro dos Santos-Lopes, Pedro Floriano, Thayse A. Pacheco, Katherine L. Rhode, Pauline Barmby, Niranjana P., Yasna Ordenes-Briceño, Teymoor Saifollahi, Rubens E. G. Machado, Julia Gschwend
TL;DR
The study investigates how far color information alone in a six-band LSST-like catalog ($ugrizY$) can distinguish point-like extragalactic globular clusters (GCs) from contaminants. By constructing a labeled Fornax-based dataset from FDS and DES, and applying color-space transforms via PCA and non-linear auto-encoders, the authors evaluate Random Forest and MLP classifiers across multiple input representations. They find that principal components reduce contamination from ~$45\%$ to ~$35\%$ but at the cost of strong incompleteness, while auto-encoders provide no improvement; 2D color-color projections are particularly limited. The work concludes that color information alone has a ceiling on GC identification quality and advocates augmenting photometry with ancillary data (morphology, near-IR, astrometry) to fully exploit LSST’s potential for GC science.
Abstract
Globular clusters (GCs), densely packed collections of thousands to millions of old stars, are excellent tracers of their host galaxies' evolutionary histories. Traditional methods for identifying GCs in galaxies rely on cuts over photometric catalogs and can yield source lists with high levels of contamination from compact background galaxies and foreground stars. In an era when large-scale sky surveys produce photometry for millions of sources, it is essential to employ flexible and scalable tools to reliably identify GCs in external galaxies. To prepare for surveys like Rubin/LSST, we need to explore practical methodological improvements and quantify the limitations inherent in the datasets. This paper investigates the selection of point-like extragalactic GCs exclusively in the $ugrizY$ color space. We use archival data to assemble an LSST-like photometric catalog for the Fornax Cluster containing labeled spectroscopically confirmed GCs, galaxies, and stars. From this catalog, using principal component analysis and non-linear auto-encoders (AEs), we construct inputs to random forest and multi-layer perceptron classifiers. We show that selecting GCs using ugrizY colors can lead to contamination rates of ~ 45%. If the principal components of the colors are used instead, this rate reduces to ~ 35% without increasing incompleteness. The AEs did not improve GC identification. To further reduce contamination and extract the full potential of LSST for star cluster studies, we argue for the need to augment photometric information with ancillary data (morphology from space-based missions and near-infrared photometry) before attempting to leverage more complex models.
