Table of Contents
Fetching ...

Advancing Machine Learning for Stellar Activity and Exoplanet Period Rotation

Fatemeh Fazel Hesar, Bernard Foing, Ana M. Heras, Mojtaba Raouf, Victoria Foing, Shima Javanmardi, Fons J. Verbeek

TL;DR

The paper addresses the challenge of accurately estimating stellar rotation periods from noisy Kepler light curves. It develops a pipeline that blends physics-based initial period estimates with a suite of ML models (DT, RF, KNN, GB) and a Voting Ensemble, complemented by Gaussian Process baselines. The Best-Model Voting Ensemble achieves substantially lower RMSE than individual models and often rivals or exceeds GP performance, highlighting the robustness of ensemble methods for disentangling stellar activity from transit signals. The work improves exoplanet transit characterization and gyrochronology by delivering more reliable rotation-period measurements, with potential impact for future missions and large-scale time-series analyses in stellar astrophysics.

Abstract

This study applied machine learning models to estimate stellar rotation periods from corrected light curve data obtained by the NASA Kepler mission. Traditional methods often struggle to estimate rotation periods accurately due to noise and variability in the light curve data. The workflow involved using initial period estimates from the LS-Periodogram and Transit Least Squares techniques, followed by splitting the data into training, validation, and testing sets. We employed several machine learning algorithms, including Decision Tree, Random Forest, K-Nearest Neighbors, and Gradient Boosting, and also utilized a Voting Ensemble approach to improve prediction accuracy and robustness. The analysis included data from multiple Kepler IDs, providing detailed metrics on orbital periods and planet radii. Performance evaluation showed that the Voting Ensemble model yielded the most accurate results, with an RMSE approximately 50\% lower than the Decision Tree model and 17\% better than the K-Nearest Neighbors model. The Random Forest model performed comparably to the Voting Ensemble, indicating high accuracy. In contrast, the Gradient Boosting model exhibited a worse RMSE compared to the other approaches. Comparisons of the predicted rotation periods to the photometric reference periods showed close alignment, suggesting the machine learning models achieved high prediction accuracy. The results indicate that machine learning, particularly ensemble methods, can effectively solve the problem of accurately estimating stellar rotation periods, with significant implications for advancing the study of exoplanets and stellar astrophysics.

Advancing Machine Learning for Stellar Activity and Exoplanet Period Rotation

TL;DR

The paper addresses the challenge of accurately estimating stellar rotation periods from noisy Kepler light curves. It develops a pipeline that blends physics-based initial period estimates with a suite of ML models (DT, RF, KNN, GB) and a Voting Ensemble, complemented by Gaussian Process baselines. The Best-Model Voting Ensemble achieves substantially lower RMSE than individual models and often rivals or exceeds GP performance, highlighting the robustness of ensemble methods for disentangling stellar activity from transit signals. The work improves exoplanet transit characterization and gyrochronology by delivering more reliable rotation-period measurements, with potential impact for future missions and large-scale time-series analyses in stellar astrophysics.

Abstract

This study applied machine learning models to estimate stellar rotation periods from corrected light curve data obtained by the NASA Kepler mission. Traditional methods often struggle to estimate rotation periods accurately due to noise and variability in the light curve data. The workflow involved using initial period estimates from the LS-Periodogram and Transit Least Squares techniques, followed by splitting the data into training, validation, and testing sets. We employed several machine learning algorithms, including Decision Tree, Random Forest, K-Nearest Neighbors, and Gradient Boosting, and also utilized a Voting Ensemble approach to improve prediction accuracy and robustness. The analysis included data from multiple Kepler IDs, providing detailed metrics on orbital periods and planet radii. Performance evaluation showed that the Voting Ensemble model yielded the most accurate results, with an RMSE approximately 50\% lower than the Decision Tree model and 17\% better than the K-Nearest Neighbors model. The Random Forest model performed comparably to the Voting Ensemble, indicating high accuracy. In contrast, the Gradient Boosting model exhibited a worse RMSE compared to the other approaches. Comparisons of the predicted rotation periods to the photometric reference periods showed close alignment, suggesting the machine learning models achieved high prediction accuracy. The results indicate that machine learning, particularly ensemble methods, can effectively solve the problem of accurately estimating stellar rotation periods, with significant implications for advancing the study of exoplanets and stellar astrophysics.
Paper Structure (18 sections, 16 equations, 8 figures, 3 tables)

This paper contains 18 sections, 16 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The diagram outlines the different steps involved, including initial estimates from corrected light curves, data splitting, application of machine learning models such as Decision Tree, Random Forest, K-Nearest Neighbors, and Gradient Boosting, model validation, testing, and the use of a Voting Ensemble (Best Model) approach. It also includes comparisons to the Gaussian Process approach for estimating rotation periods.
  • Figure 2: The light curve of star which is divided into three panels. The first panel shows the raw light curve, which is a plot of the star's brightness as a function of time. The second panel shows the stellar activity rotation, which is a measure of the star's rotation rate. The third panel shows the flattened light curve, which is a plot of the star's brightness as a function of time after the stellar activity rotation has been removed.
  • Figure 3: The heatmap displays the precision values achieved for each algorithm, with higher values represented by darker shades. The algorithms include Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), and Gradient Boosting (GB). Each Kepler ID (K107b, K155b, K17b, K39b, K43b, K45b, K78b, K75b, K96b) corresponds to a specific stellar activity. The values are rounded to three decimal places and indicate the precision of the respective algorithm on each Kepler ID.
  • Figure 4: Bar plot illustrating the Best model performance metrics for each Kepler ID. The plot showcases the values of Accuracy, Precision, Recall, and F1 Score, which were calculated based on the model's confusion matrix. Each bar represents a specific Kepler ID, allowing for a visual comparison of the performance metrics across different IDs. The color-coded bars provide a clear distinction between the metrics, aiding in the assessment of the model's classification performance for each Kepler ID.
  • Figure 5: The Lomb-Scargle periodograms computed from the photometric light curves of the exoplanet host stars in our dataset. The periodograms identify several statistically significant periodic signals present in the data, with the strongest peak (gray shaded line) in each plot corresponding to the estimated stellar rotation period. To quantify the uncertainty in these period estimates, we have fitted a Gaussian function to the primary peak and determined the half-width at half-maximum (HWHM) of the Gaussian (red-dashed line). This HWHM value provides an estimate of the period uncertainty that accounts for the sampling characteristics of the periodogram.
  • ...and 3 more figures