A search for new symbiotic stars in the Milky Way: Using machine learning techniques applied to photometric databases
V. Contreras Rojas, M. Jaque Arancibia, C. E. Ferreira Lopes, N. Monsalves, R. Angeloni, G. J. M. Luna, V. Marels, D. Concha, N. E. Nunez, C. Saffe, M. Flores
TL;DR
Symbiotic stars are rare yet informative interacting binaries, and their Galactic census remains incomplete. The authors deploy a supervised machine-learning pipeline that fuses Gaia DR3, 2MASS, and WISE photometry with parallaxes and Hα information to target S-type SySts, training a Random Forest with SMOTE on 166 confirmed S-type systems and 1,600 non-symbiotic stars. Applied to roughly 2.5 million candidates, the method identifies 990 high-probability SySts, from which 12 high-confidence objects are selected using physically motivated cuts, all showing properties consistent with S-type SySts and UV excess. Independent validation on recently confirmed systems recovers 92.3% of known S-types, underscoring the robustness and generalizability of the approach and its potential to refine the Galactic SySt census with follow-up spectroscopy.
Abstract
Symbiotic stars (SySts) are interacting binaries composed of a red giant transferring material to a hot compact star, typically a white dwarf. Although only about 300 systems are confirmed, the Galactic population is estimated at 1.2 x 10^3 - 1.5 x 10^4, indicating that most remain undiscovered. We identify new SySts using a machine-learning approach that combines Gaia DR3, 2MASS, and WISE photometry, parallaxes, and the pseudo-equivalent width of H alpha. A Random Forest model was trained on 166 confirmed S-type SySts and 1600 non-symbiotic stars, applying SMOTE to mitigate class imbalance. The model achieved an F1-score of 89% for the symbiotic class. Applied to 2.5 x 10^6 color-selected sources, it identified 990 candidates with probabilities more than 70%. We further refined the sample using physically motivated cuts on effective temperature, surface gravity, metallicity, and SkyMapper photometry, yielding 12 high-confidence candidates. These objects show cool temperatures, low surface gravities, near-solar metallicity, H alpha emission, moderate-to-high luminosities, and UV excess consistent with S-type SySts. Validation on recently confirmed systems recovered 92.3%, demonstrating the robustness and generalizability of our method.
