Detecting False Positives With Derived Planetary Parameters: Experimenting with the KEPLER Dataset
Ayan Bin Rafaih, Zachary Murray
TL;DR
The paper investigates whether derived planetary parameters, rather than full light curves, can effectively identify false positives in Kepler transit data. By evaluating Logistic Regression, Random Forest, SVM, and CNNs on a 9-feature derived parameter set, the study finds that RF and CNNs nearly match the information content of the light curves, achieving up to approximately 92% validation accuracy and strong PR-F1 performance. The results show that simple models can miss subtleties, while CNNs offer the best overall performance, though with higher variability, and that the approach excels particularly for stellar eclipse-related false positives. This lightweight, parameter-focused strategy enables fast, scalable vetting suitable for large datasets and potential application to future missions like TESS.
Abstract
Recent developments in computational power and machine learning techniques motivate their use in many different astrophysical research areas. Consequently, many machine learning models have been trained to classify exoplanet transit signals - typically done by using time series light curves. In this work, we attempt a different approach and try to improve the efficiency of these algorithms by fitting only derived planetary parameters, instead of full time-series light curves. We investigate and evaluate 4 models (Logistic Regression, Random Forest, Support Vector Machines, and Convolutional Neural Networks) on the KEPLER dataset, using precision-recall trade-off and accuracy metrics. We show that this approach can identify up to about 90% of false positives, implying the planetary parameters encompass most of the relevant information contained in a light curve. Random Forest and Convolutional Neural Networks produce the highest accuracy and the best precision-recall trade-off. We also note that the accuracies as a function of the stellar eclipse flag SS have the best performance.
