Detecting wide binaries using machine learning algorithms
Amoy Ashesh, Harsimran Kaur, Sandeep Aashish
TL;DR
The paper tackles the challenge of identifying wide binary stars in Gaia DR3, a problem crucial for tests of gravitational theories at low accelerations. It introduces a supervised learning framework trained on established wide-binary catalogs, leveraging data preprocessing (SMOTE, PCA, correlation analysis) and a two-stage pipeline of clustering and 3D nearest-neighbour pairing to assemble candidate binaries. The Random Forest classifier trained on SMOTE-balanced data achieves high accuracy (≈0.998) and precision/recall (≈0.92), while raw models underperform, underscoring the value of addressing class imbalance. A publicly available tool accompanies the work, enabling rapid, scalable generation of WBS catalogues from raw Gaia data and offering pathways to anomaly detection and exploration of exotic gravity signatures.
Abstract
We present a machine learning (ML) framework for the detection of wide binary star systems using Gaia DR3 data. By training supervised ML models on established wide binary catalogues, we efficiently classify wide binaries and employ clustering and nearest neighbour search to pair candidate systems. Our approach incorporates data preprocessing techniques such as SMOTE, correlation analysis, and PCA, and achieves high accuracy and recall in the task of wide binary classification. The resulting publicly available code enables rapid, scalable, and customizable analysis of wide binaries, complementing conventional analyses and providing a valuable resource for future astrophysical studies.
