Light Curve Classification with DistClassiPy: a new distance-based classifier
Siddharth Chaini, Ashish Mahabal, Ajit Kembhavi, Federica B. Bianco
TL;DR
This work addresses the challenge of scalable, interpretable light-curve classification in time-domain astronomy by introducing DistClassiPy, a distance-metric classifier built on 18 metrics and domain-driven light-curve features. The method reduces 114 features to a compact 31-feature set and uses per-class medians with distance-based scoring, achieving $F_1$ scores comparable to a Random Forest baseline while offering faster computation and enhanced interpretability. Key contributions include a transparent feature-selection pipeline, confidence measures for distance-based decisions, and a publicly available open-source package suitable for large surveys like the Rubin Observatory LSST. The results demonstrate robust performance across multi-class, One-vs-Rest, and binary tasks, with strong scalability and potential for tailoring to specific science goals and datasets beyond astronomy.
Abstract
The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. While tree-based models (e.g. Random Forests) and deep learning models dominate the field, we explore the use of different distance metrics to aid in the classification of astrophysical objects. We developed DistClassiPy, a new distance metric based classifier. The direct use of distance metrics is unexplored in time-domain astronomy, but distance-based methods can help make classification more interpretable and decrease computational costs. In particular, we applied DistClassiPy to classify light curves of variable stars, comparing the distances between objects of different classes. Using 18 distance metrics on a catalog of 6,000 variable stars across 10 classes, we demonstrate classification and dimensionality reduction. Our classifier meets state-of-the-art performance but has lower computational requirements and improved interpretability. Additionally, DistClassiPy can be tailored to specific objects by identifying the most effective distance metric for that classification. To facilitate broader applications within and beyond astronomy, we have made DistClassiPy open-source and available at https://pypi.org/project/distclassipy/.
