Table of Contents
Fetching ...

The contribution of the color space in LSST-like photometry for the selection of extragalactic globular cluster candidates

Nicholas Schweder-Souza, Ana L. Chies-Santos, Rafael S. de Souza, Kristen C. Dage, Charles J. Bonatto, Juan P. Caso, Michele Cantiello, Pedro dos Santos-Lopes, Pedro Floriano, Thayse A. Pacheco, Katherine L. Rhode, Pauline Barmby, Niranjana P., Yasna Ordenes-Briceño, Teymoor Saifollahi, Rubens E. G. Machado, Julia Gschwend

TL;DR

The study investigates how far color information alone in a six-band LSST-like catalog ($ugrizY$) can distinguish point-like extragalactic globular clusters (GCs) from contaminants. By constructing a labeled Fornax-based dataset from FDS and DES, and applying color-space transforms via PCA and non-linear auto-encoders, the authors evaluate Random Forest and MLP classifiers across multiple input representations. They find that principal components reduce contamination from ~$45\%$ to ~$35\%$ but at the cost of strong incompleteness, while auto-encoders provide no improvement; 2D color-color projections are particularly limited. The work concludes that color information alone has a ceiling on GC identification quality and advocates augmenting photometry with ancillary data (morphology, near-IR, astrometry) to fully exploit LSST’s potential for GC science.

Abstract

Globular clusters (GCs), densely packed collections of thousands to millions of old stars, are excellent tracers of their host galaxies' evolutionary histories. Traditional methods for identifying GCs in galaxies rely on cuts over photometric catalogs and can yield source lists with high levels of contamination from compact background galaxies and foreground stars. In an era when large-scale sky surveys produce photometry for millions of sources, it is essential to employ flexible and scalable tools to reliably identify GCs in external galaxies. To prepare for surveys like Rubin/LSST, we need to explore practical methodological improvements and quantify the limitations inherent in the datasets. This paper investigates the selection of point-like extragalactic GCs exclusively in the $ugrizY$ color space. We use archival data to assemble an LSST-like photometric catalog for the Fornax Cluster containing labeled spectroscopically confirmed GCs, galaxies, and stars. From this catalog, using principal component analysis and non-linear auto-encoders (AEs), we construct inputs to random forest and multi-layer perceptron classifiers. We show that selecting GCs using ugrizY colors can lead to contamination rates of ~ 45%. If the principal components of the colors are used instead, this rate reduces to ~ 35% without increasing incompleteness. The AEs did not improve GC identification. To further reduce contamination and extract the full potential of LSST for star cluster studies, we argue for the need to augment photometric information with ancillary data (morphology from space-based missions and near-infrared photometry) before attempting to leverage more complex models.

The contribution of the color space in LSST-like photometry for the selection of extragalactic globular cluster candidates

TL;DR

The study investigates how far color information alone in a six-band LSST-like catalog () can distinguish point-like extragalactic globular clusters (GCs) from contaminants. By constructing a labeled Fornax-based dataset from FDS and DES, and applying color-space transforms via PCA and non-linear auto-encoders, the authors evaluate Random Forest and MLP classifiers across multiple input representations. They find that principal components reduce contamination from ~ to ~ but at the cost of strong incompleteness, while auto-encoders provide no improvement; 2D color-color projections are particularly limited. The work concludes that color information alone has a ceiling on GC identification quality and advocates augmenting photometry with ancillary data (morphology, near-IR, astrometry) to fully exploit LSST’s potential for GC science.

Abstract

Globular clusters (GCs), densely packed collections of thousands to millions of old stars, are excellent tracers of their host galaxies' evolutionary histories. Traditional methods for identifying GCs in galaxies rely on cuts over photometric catalogs and can yield source lists with high levels of contamination from compact background galaxies and foreground stars. In an era when large-scale sky surveys produce photometry for millions of sources, it is essential to employ flexible and scalable tools to reliably identify GCs in external galaxies. To prepare for surveys like Rubin/LSST, we need to explore practical methodological improvements and quantify the limitations inherent in the datasets. This paper investigates the selection of point-like extragalactic GCs exclusively in the color space. We use archival data to assemble an LSST-like photometric catalog for the Fornax Cluster containing labeled spectroscopically confirmed GCs, galaxies, and stars. From this catalog, using principal component analysis and non-linear auto-encoders (AEs), we construct inputs to random forest and multi-layer perceptron classifiers. We show that selecting GCs using ugrizY colors can lead to contamination rates of ~ 45%. If the principal components of the colors are used instead, this rate reduces to ~ 35% without increasing incompleteness. The AEs did not improve GC identification. To further reduce contamination and extract the full potential of LSST for star cluster studies, we argue for the need to augment photometric information with ancillary data (morphology from space-based missions and near-infrared photometry) before attempting to leverage more complex models.

Paper Structure

This paper contains 16 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Entire coverage of FDS in gray, the red circle indicates the 1-degree radius region around NGC 1399; our sources of interest. The black points represent the positions of spectroscopically confirmed globular clusters for which we have FDS $ugri$ photometry available.
  • Figure 2: Magnitude errors plotted versus magnitudes for the bands in common between DES and FDS, $gri$. Black dots represent spectroscopically confirmed GCs. The first row of plots refers to FDS PSF photometry data, the second to DES circular aperture photometry, and the third to DES automatic aperture (based on the Kron radius) photometry. For visualization purposes, the magnitude error axes were truncated at a value of $0.6$ mag. Errors in FDS data do not exceed $0.6$ mag: all FDS data points are visible in these plots.
  • Figure 3: $\Delta\rm{mag_{FDS-DES}}$ versus $g,r,i_{\rm FDS}$: the difference in magnitude for the same source, in the same band, but in different surveys against the FDS magnitude in the same band. The first row displays the plots where DES MAG_APER_5 ($2.92"$) data was used, the second DES MAG_APER_4 ($1.92"$), and the third DES MAG_AUTO. The horizontal black line is $y=0$.
  • Figure 4: Distribution of magnitude values for each band. The colored bars represent the dataset before the pre-processing described in Subsection \ref{['sub:preproc']}. The black edges indicate the subset that represents the dataset after all the filtering; it contains only the labeled sources.
  • Figure 5: Diagram to illustrate the flux of data in our analysis procedure. PCs $\sim$ principal components, LSCs $\sim$ latent space coordinates, RFC $\sim$ random forest classifier, MLPC $\sim$ multi-layer perceptron classifier.
  • ...and 4 more figures