Table of Contents
Fetching ...

The Next Generation Fornax Survey (NGFS).VIII. A Support Vector Machine Approach for Disentangling Globular Clusters from other Sources

Yasna Ordenes-Briceño, Thomas H. Puzia, Paul Eigenthaler, Matias Blaña, Juan P. Carvajal, Matthew A. Taylor, Bryan W. Miller, Rohan Rahatgaonkar, Evelyn J. Johnston, Prasanta K. Nayak, Gaspar Galaz

TL;DR

Globular cluster identification in wide-field surveys is hampered by overlap with stars and background galaxies. The authors develop a supervised SVM classifier using 15 color/morphology features derived from $u'g'i'JK_s$ photometry in NGFS-T1, trained on spectroscopically confirmed GCs, stars, and galaxies, achieving $97.3\%$ accuracy with the full feature set and $96.6\%$ with a reduced 7-feature set. Inclusion of $u'$ and near-IR bands markedly improves GC discrimination, and simulations with LSST-like data show that $u'$ and $Y$ are essential for robust separation, with even higher gains anticipated when Euclid/Roman NIR data are integrated. The resulting scalable GC catalogs enable detailed studies of Fornax assembly and provide a framework for photometric GC classification in upcoming LSST/Euclid/Roman-era surveys. The work demonstrates the diagnostic power of broad SED coverage combined with simple morphology for discriminating unresolved sources in deep extragalactic imaging.

Abstract

Wide-field, multi-band surveys now detect millions of unresolved sources in nearby galaxy clusters, yet separating globular clusters (GCs) from foreground stars and background galaxies remains challenging. Scalable, automated classification is therefore essential to convert the forthcoming data from facilities such as the Vera C. Rubin/LSST, the Roman and Euclid into robust constraints on galaxy assembly. We introduce a supervised classification method to separate GCs, stars, and galaxies based on their locations in color-color diagrams. The main objective is to recover a clean GC sample for future scientific analysis. The method exploits broad spectral energy distribution coverage, deep photometry, and is optimized for next-generation survey volumes. We use the central 3deg2 of the Next Generation Fornax Survey (NGFS), which images the Fornax cluster in u'g'i'JKs. We build a Support Vector Machine (SVM; svm.SVC, scikit-learn) using 15 features: all color combinations and basic morphological parameters. Spectroscopically confirmed sources define the training classes. Color pairs connecting near-UV/optical/near-IR. The full 15 feature model achieves 97.3% accuracy and a pruned 7 feature model built from the most informative, least correlated features achieves 96.6% accuracy. Misclassifications amount 8.4% and 10.4%, respectively. Omitting the u' or/and near-IR bands degrades performance. Emulating LSST filters with NGFS u'g'i' and DES r'z'Y shows that u' and Y bands are crucial, but models lacking NIR remain suboptimal. Combining broad SED coverage with simple morphological parameters enables precise, scalable separation of unresolved sources. Including NIR bands significantly improves GC classification, and joining LSST with forthcoming Euclid and Roman data will further enhance machine-learning frameworks.

The Next Generation Fornax Survey (NGFS).VIII. A Support Vector Machine Approach for Disentangling Globular Clusters from other Sources

TL;DR

Globular cluster identification in wide-field surveys is hampered by overlap with stars and background galaxies. The authors develop a supervised SVM classifier using 15 color/morphology features derived from photometry in NGFS-T1, trained on spectroscopically confirmed GCs, stars, and galaxies, achieving accuracy with the full feature set and with a reduced 7-feature set. Inclusion of and near-IR bands markedly improves GC discrimination, and simulations with LSST-like data show that and are essential for robust separation, with even higher gains anticipated when Euclid/Roman NIR data are integrated. The resulting scalable GC catalogs enable detailed studies of Fornax assembly and provide a framework for photometric GC classification in upcoming LSST/Euclid/Roman-era surveys. The work demonstrates the diagnostic power of broad SED coverage combined with simple morphology for discriminating unresolved sources in deep extragalactic imaging.

Abstract

Wide-field, multi-band surveys now detect millions of unresolved sources in nearby galaxy clusters, yet separating globular clusters (GCs) from foreground stars and background galaxies remains challenging. Scalable, automated classification is therefore essential to convert the forthcoming data from facilities such as the Vera C. Rubin/LSST, the Roman and Euclid into robust constraints on galaxy assembly. We introduce a supervised classification method to separate GCs, stars, and galaxies based on their locations in color-color diagrams. The main objective is to recover a clean GC sample for future scientific analysis. The method exploits broad spectral energy distribution coverage, deep photometry, and is optimized for next-generation survey volumes. We use the central 3deg2 of the Next Generation Fornax Survey (NGFS), which images the Fornax cluster in u'g'i'JKs. We build a Support Vector Machine (SVM; svm.SVC, scikit-learn) using 15 features: all color combinations and basic morphological parameters. Spectroscopically confirmed sources define the training classes. Color pairs connecting near-UV/optical/near-IR. The full 15 feature model achieves 97.3% accuracy and a pruned 7 feature model built from the most informative, least correlated features achieves 96.6% accuracy. Misclassifications amount 8.4% and 10.4%, respectively. Omitting the u' or/and near-IR bands degrades performance. Emulating LSST filters with NGFS u'g'i' and DES r'z'Y shows that u' and Y bands are crucial, but models lacking NIR remain suboptimal. Combining broad SED coverage with simple morphological parameters enables precise, scalable separation of unresolved sources. Including NIR bands significantly improves GC classification, and joining LSST with forthcoming Euclid and Roman data will further enhance machine-learning frameworks.

Paper Structure

This paper contains 28 sections, 1 equation, 16 figures, 5 tables.

Figures (16)

  • Figure 1: RGB composite image of NGFS Tile 1, constructed using DECam filters ($i'$ in red, $g'$ in green, and $u'$ in blue). The field of view corresponds to a single DECam tile, with a radius of $1.1^\circ \approx 370$ kpc at the distance of the Fornax cluster, D = 19.3 Mpc Anand2024. The NIR imaging FoV is shown with a red unfilled rectangle, see Sect \ref{['sect:data']} for details. The names of the main galaxies are shown in the image, with the cD galaxy, NGC 1399, located near the image center. Angular and physical scales are indicated: a white line at the bottom right represents $0.25^\circ$, and the line at the bottom left corresponds to 100 kpc.
  • Figure 2: Color-color diagrams for all sources with multi-wavelength photometry in the core region of the Fornax galaxy cluster, shown as gray dots. Spectroscopically confirmed samples are shown for GCs (blue), stars (golden) and galaxies (purple), which are used as labeled samples for the svm.SVC model (see Section \ref{['sect:traindataset']}). Note that the different diagrams present the same source sample for which photometric information was obtained in the master cross-matched catalog between $u'g'i'JK_s$ filters.
  • Figure 3: Color-color diagram $u'g'K_s$ (top-panel) and $u'g'i'$ (bottom-panel) with PEGASE.2 population synthesis models fioc97. In addition to the same layout as in Figure \ref{['fig:f_cc_ugiJKs']}, the finely blue lines in the top-panel cc-diagram represent sequences of old single-age stellar populations (at redshift zero), the metallicity range is indicated in the legend. The large colored symbols represent the evolutionary paths of observed colors for four galaxies formed at redshift 3, with different star formation histories: burst+low SFR (squares), constant SFR (stars), exponential decline SFR (triangle) and burst+passive Elliptical (circle).
  • Figure 4: Best outcome model accuracy and scores, with a split in 70% train and 30% test and kernel=RBF, C=10 and $\gamma$=0.1. The permutation of importance for the 15 features provided to SVM.svc, ordered from the most to the least important features for the model.
  • Figure 5: Clustermap correlation for 15 features. The correlation matrix displays numerical values in each cell with a heatmap ranging from -1.0 to 1.0, where '0' indicates no correlation (white), '-1' indicates linear anti-correlation (blue), and '1' indicates linear correlation (red). The dendrogram identifies two major feature clusters, a cluster A of color indices and a cluster B of morpho-parameters.
  • ...and 11 more figures