Table of Contents
Fetching ...

Detecting wide binaries using machine learning algorithms

Amoy Ashesh, Harsimran Kaur, Sandeep Aashish

TL;DR

The paper tackles the challenge of identifying wide binary stars in Gaia DR3, a problem crucial for tests of gravitational theories at low accelerations. It introduces a supervised learning framework trained on established wide-binary catalogs, leveraging data preprocessing (SMOTE, PCA, correlation analysis) and a two-stage pipeline of clustering and 3D nearest-neighbour pairing to assemble candidate binaries. The Random Forest classifier trained on SMOTE-balanced data achieves high accuracy (≈0.998) and precision/recall (≈0.92), while raw models underperform, underscoring the value of addressing class imbalance. A publicly available tool accompanies the work, enabling rapid, scalable generation of WBS catalogues from raw Gaia data and offering pathways to anomaly detection and exploration of exotic gravity signatures.

Abstract

We present a machine learning (ML) framework for the detection of wide binary star systems using Gaia DR3 data. By training supervised ML models on established wide binary catalogues, we efficiently classify wide binaries and employ clustering and nearest neighbour search to pair candidate systems. Our approach incorporates data preprocessing techniques such as SMOTE, correlation analysis, and PCA, and achieves high accuracy and recall in the task of wide binary classification. The resulting publicly available code enables rapid, scalable, and customizable analysis of wide binaries, complementing conventional analyses and providing a valuable resource for future astrophysical studies.

Detecting wide binaries using machine learning algorithms

TL;DR

The paper tackles the challenge of identifying wide binary stars in Gaia DR3, a problem crucial for tests of gravitational theories at low accelerations. It introduces a supervised learning framework trained on established wide-binary catalogs, leveraging data preprocessing (SMOTE, PCA, correlation analysis) and a two-stage pipeline of clustering and 3D nearest-neighbour pairing to assemble candidate binaries. The Random Forest classifier trained on SMOTE-balanced data achieves high accuracy (≈0.998) and precision/recall (≈0.92), while raw models underperform, underscoring the value of addressing class imbalance. A publicly available tool accompanies the work, enabling rapid, scalable generation of WBS catalogues from raw Gaia data and offering pathways to anomaly detection and exploration of exotic gravity signatures.

Abstract

We present a machine learning (ML) framework for the detection of wide binary star systems using Gaia DR3 data. By training supervised ML models on established wide binary catalogues, we efficiently classify wide binaries and employ clustering and nearest neighbour search to pair candidate systems. Our approach incorporates data preprocessing techniques such as SMOTE, correlation analysis, and PCA, and achieves high accuracy and recall in the task of wide binary classification. The resulting publicly available code enables rapid, scalable, and customizable analysis of wide binaries, complementing conventional analyses and providing a valuable resource for future astrophysical studies.

Paper Structure

This paper contains 13 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison of data distribution before and after applying SMOTE. Here 0 depicts that the entity is not a part of a WBS and 1 depicts that the entity is a part of a WBS.
  • Figure 2: Methodology for predicting WBS
  • Figure 3: Confusion matrices for the raw-filtered dataset predictions and the SMOTE-balanced dataset predictions
  • Figure 4: The distribution of the clusters
  • Figure 5: WBS connected to their respective pairs
  • ...and 2 more figures