Table of Contents
Fetching ...

Star-Galaxy Classification in Deep LSST Data with Random Forest: A Pilot study on the Data Preview 1 Release

M. Gatto, V. Ripepi, M. Bellazzini, C. Tortora, M. Dall'Ora

Abstract

The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will produce unprecedentedly deep and wide photometric catalogs, enabling transformative studies of faint stellar systems such as the research of ultra-faint dwarf galaxies (UFDs). A critical challenge for these studies is reliable star-galaxy separation at faint magnitudes, where compact background galaxies increasingly contaminate stellar samples. This work aims to assess the performance of supervised machine-learning techniques for star-galaxy separation in LSST-like data, quantify the relative importance of morphological and photometric information, and identify the most effective combinations of input features for minimizing galaxy contamination while preserving stellar completeness in the faint regime relevant for UFD searches. We apply a Random Forest classifier to observations of the Extended Chandra Deep Field South from LSST Data Preview 1 (DP1), the deepest field observed within the DP1. We construct a curated sample of bona fide stars and galaxies using spectroscopic data, Gaia DR3, and multi-band photometric catalogs. We train and validate the classifier using several configurations of LSST-based input features, including multi-band colors, the LSST morphological parameter refExtendedness, and photometric uncertainties. We find that LSST multi-band photometry alone delivers a good star-galaxy separation, significantly outperforming morphology-based classification at faint magnitudes. Colors involving the u-band are essential to provide a robust star galaxy separation. Furthermore, explicitly including photometric uncertainties as input features yields the best overall performance. Across all configurations that include all the six LSST filters, galaxy contamination remains negligible almost the whole magnitude range probed in this work (i.e. r < 27.5 mag). [abridged]

Star-Galaxy Classification in Deep LSST Data with Random Forest: A Pilot study on the Data Preview 1 Release

Abstract

The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will produce unprecedentedly deep and wide photometric catalogs, enabling transformative studies of faint stellar systems such as the research of ultra-faint dwarf galaxies (UFDs). A critical challenge for these studies is reliable star-galaxy separation at faint magnitudes, where compact background galaxies increasingly contaminate stellar samples. This work aims to assess the performance of supervised machine-learning techniques for star-galaxy separation in LSST-like data, quantify the relative importance of morphological and photometric information, and identify the most effective combinations of input features for minimizing galaxy contamination while preserving stellar completeness in the faint regime relevant for UFD searches. We apply a Random Forest classifier to observations of the Extended Chandra Deep Field South from LSST Data Preview 1 (DP1), the deepest field observed within the DP1. We construct a curated sample of bona fide stars and galaxies using spectroscopic data, Gaia DR3, and multi-band photometric catalogs. We train and validate the classifier using several configurations of LSST-based input features, including multi-band colors, the LSST morphological parameter refExtendedness, and photometric uncertainties. We find that LSST multi-band photometry alone delivers a good star-galaxy separation, significantly outperforming morphology-based classification at faint magnitudes. Colors involving the u-band are essential to provide a robust star galaxy separation. Furthermore, explicitly including photometric uncertainties as input features yields the best overall performance. Across all configurations that include all the six LSST filters, galaxy contamination remains negligible almost the whole magnitude range probed in this work (i.e. r < 27.5 mag). [abridged]

Paper Structure

This paper contains 26 sections, 4 equations, 19 figures, 11 tables.

Figures (19)

  • Figure 1: $r$-band magnitude distribution of the full sample of bona-fide stars and galaxies. Stars are shown in blue and galaxies in red.
  • Figure 2: Performance of the refExtendedness parameter as a function of $r$-band magnitude for our catalog. The blue solid line shows the fraction of true stars correctly identified as stars, while the red solid line indicates the fraction of true galaxies misclassified as stars.
  • Figure 3: Confusion matrix on the validation sample for the reference set (left matrix) and with the inclusion of photometric uncertainties (right matrix).
  • Figure 4: Performance of the refExtendedness parameter (leftmost panel) and the random forest classifier (left-center panel), evaluated on the validation sample, as a function of the $r$-band magnitude, for the reference set. For both panels, the blue curves show the fraction of true stars correctly classified as stars (stellar completeness), while the red curves show the fraction of true galaxies misclassified as stars (galaxy contamination). The last two panels displays the performance of the random forest classifier obtained by removing the refExtendedness parameter (right-center panel) and by adding photometric uncertainties (rightmost panel) to the reference set.
  • Figure 5: Relative importance of each feature for the reference set (top panel) and with the inclusion of photometric uncertainties (bottom panel).
  • ...and 14 more figures