Table of Contents
Fetching ...

Machine Learning for Exoplanet Discovery: Validating TESS Candidates and Identifying Planets in the Habitable Zone

Sarah Huang, Chen Jiang

TL;DR

The paper presents a machine learning framework trained on Kepler KOI data to automate exoplanet candidate validation and applies it to TESS TOIs. Random Forest delivers the best cross‑validation and TOI performance, identifying hundreds of new candidates and revealing numerous multi‑planet and habitable‑zone systems, while Transformer models show potential with more data. The study demonstrates how data integration (Kepler KOI/TOI, Gaia DR3) and careful feature selection can accelerate vetting in large photometric surveys, with results supporting follow‑up observations of promising habitable‑zone planets. It also provides insights into population differences between Kepler and TESS targets and discusses limitations due to incomplete metadata, underscoring the need for improved cross‑catalog interoperability and future adaptation to PLATO/Earth 2.0.

Abstract

The high-precision photometry from NASA's Kepler and TESS missions has revolutionized exoplanet detection, enabling the discovery of over 5500 confirmed exoplanets via the transit method and around 10000 additional candidates awaiting validation. However, confirming these candidates as true planets demands meticulous vetting and follow-up observations, which hampers the discovery of exoplanets in large-scale datasets. To address this challenge, we developed a machine learning framework trained on Kepler's catalog of confirmed exoplanets and false positives to accurately identify true planetary candidates. Our model uses transit properties, planetary characteristics, and host stellar parameters as training features. The optimized model achieved 83.9% accuracy in cross-validation. When applied to 3987 TESS candidates with complete observational data, the model identified 1595 new high-confidence planets and correctly recovered 86% (358/418) of all previously confirmed TESS exoplanets in a blinded validation test. Our analysis revealed 100 previously unrecognized multi-planet systems, including five systems--that host habitable-zone exoplanets. Additionally, we identified 15 more planets within the habitable zone of a single system, suggesting strong potential for liquid water stability under conservative planetary albedo assumptions. This work demonstrates that machine learning can accelerate exoplanet validation while maintaining scientific rigor. Our modular design enables direct adaptation to future photometric missions like PLATO or Earth 2.0.

Machine Learning for Exoplanet Discovery: Validating TESS Candidates and Identifying Planets in the Habitable Zone

TL;DR

The paper presents a machine learning framework trained on Kepler KOI data to automate exoplanet candidate validation and applies it to TESS TOIs. Random Forest delivers the best cross‑validation and TOI performance, identifying hundreds of new candidates and revealing numerous multi‑planet and habitable‑zone systems, while Transformer models show potential with more data. The study demonstrates how data integration (Kepler KOI/TOI, Gaia DR3) and careful feature selection can accelerate vetting in large photometric surveys, with results supporting follow‑up observations of promising habitable‑zone planets. It also provides insights into population differences between Kepler and TESS targets and discusses limitations due to incomplete metadata, underscoring the need for improved cross‑catalog interoperability and future adaptation to PLATO/Earth 2.0.

Abstract

The high-precision photometry from NASA's Kepler and TESS missions has revolutionized exoplanet detection, enabling the discovery of over 5500 confirmed exoplanets via the transit method and around 10000 additional candidates awaiting validation. However, confirming these candidates as true planets demands meticulous vetting and follow-up observations, which hampers the discovery of exoplanets in large-scale datasets. To address this challenge, we developed a machine learning framework trained on Kepler's catalog of confirmed exoplanets and false positives to accurately identify true planetary candidates. Our model uses transit properties, planetary characteristics, and host stellar parameters as training features. The optimized model achieved 83.9% accuracy in cross-validation. When applied to 3987 TESS candidates with complete observational data, the model identified 1595 new high-confidence planets and correctly recovered 86% (358/418) of all previously confirmed TESS exoplanets in a blinded validation test. Our analysis revealed 100 previously unrecognized multi-planet systems, including five systems--that host habitable-zone exoplanets. Additionally, we identified 15 more planets within the habitable zone of a single system, suggesting strong potential for liquid water stability under conservative planetary albedo assumptions. This work demonstrates that machine learning can accelerate exoplanet validation while maintaining scientific rigor. Our modular design enables direct adaptation to future photometric missions like PLATO or Earth 2.0.

Paper Structure

This paper contains 23 sections, 5 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Histogram of Metallicity from KOI Table.
  • Figure 2: Number of KOI targers that are classified as false positive and planet candidates in the training sample.
  • Figure 3: Architechture of the neural network model. Definition and shape of each layer are provided in Table \ref{['tb:neural_network_layers']}.
  • Figure 4: Impact of hyperparameters on the performance of the four models: a) Decision Tree, b) K Nearest Neighbors, c) Random Forest,and d) Logistic Regression. In each subfigure, the most optimal hyperparameter is marked with a red dot.
  • Figure 5: Feature importance chart for the Random Forest model.
  • ...and 7 more figures