Table of Contents
Fetching ...

WATSON-Net: Vetting, Validation, and Analysis of Transits from Space Observations with Neural Networks

M. Dévora-Pajares, F. J. Pozuelos, J. C. Suárez, M. González-Penedo, C. Dafonte

TL;DR

WATSON-Net presents a multi-branch CNN-based tool for automated vetting of transiting exoplanet signals from Kepler and, by extension, TESS. Trained on Kepler DR25 with $10$-fold cross-validation and evaluated across internal and external catalogs, it achieves competitive recall at high precision and strong ranking performance, without mission-specific fine-tuning. The study emphasizes calibration (isotonic and Platt) and defines operational thresholds (LP, LN, VP, VN) to translate probabilistic scores into actionable vetting decisions, while offering a transparent explainability framework via Branch Dropout. Integration into the SHERLOCK pipeline enables automated, interpretable vetting that can guide follow-up prioritization, with the public dearwatson package ensuring reproducibility and community adoption. Collectively, WATSON-Net advances open-source, cross-mission exoplanet vetting by combining rigorous data curation, robust calibration, and interpretable multi-branch deep learning within a production-grade pipeline.

Abstract

Context. As the number of detected transiting exoplanet candidates continues to grow, the need for robust and scalable automated tools to prioritize or validate them has become increasingly critical. Among the most promising solutions, deep learning models offer the ability to interpret complex diagnostic metrics traditionally used in the vetting process. Aims. In this work, we present WATSON-Net, a new open-source neural network classifier and data preparation package designed to compete with current state-of-the-art tools for vetting and validation of transiting exoplanet signals from space-based missions. Methods. Trained on Kepler Q1-Q17 DR25 data using 10-fold cross-validation, WATSON-Net produces ten independent models, each evaluated on dedicated validation and test sets. The ten models are calibrated and prepared to be extensible for TESS data by standardizing the input pipeline, allowing for performance assessment across different space missions. Results. For Kepler targets, WATSON-Net achieves a recall-at-precision of 0.99 (R@P0.99) of 0.903, ranking second, with only the ExoMiner network performing better (R@P0.99 = 0.936). For TESS signals, WATSON-Net emerges as the best-performing non-fine-tuned machine learning classifier, achieving a precision of 0.93 and a recall of 0.76 on a test set comprising confirmed planets and false positives. Both the model and its data preparation tools are publicly available in the dearwatson Python package, fully open-source and integrated into the vetting engine of the SHERLOCK pipeline.

WATSON-Net: Vetting, Validation, and Analysis of Transits from Space Observations with Neural Networks

TL;DR

WATSON-Net presents a multi-branch CNN-based tool for automated vetting of transiting exoplanet signals from Kepler and, by extension, TESS. Trained on Kepler DR25 with -fold cross-validation and evaluated across internal and external catalogs, it achieves competitive recall at high precision and strong ranking performance, without mission-specific fine-tuning. The study emphasizes calibration (isotonic and Platt) and defines operational thresholds (LP, LN, VP, VN) to translate probabilistic scores into actionable vetting decisions, while offering a transparent explainability framework via Branch Dropout. Integration into the SHERLOCK pipeline enables automated, interpretable vetting that can guide follow-up prioritization, with the public dearwatson package ensuring reproducibility and community adoption. Collectively, WATSON-Net advances open-source, cross-mission exoplanet vetting by combining rigorous data curation, robust calibration, and interpretable multi-branch deep learning within a production-grade pipeline.

Abstract

Context. As the number of detected transiting exoplanet candidates continues to grow, the need for robust and scalable automated tools to prioritize or validate them has become increasingly critical. Among the most promising solutions, deep learning models offer the ability to interpret complex diagnostic metrics traditionally used in the vetting process. Aims. In this work, we present WATSON-Net, a new open-source neural network classifier and data preparation package designed to compete with current state-of-the-art tools for vetting and validation of transiting exoplanet signals from space-based missions. Methods. Trained on Kepler Q1-Q17 DR25 data using 10-fold cross-validation, WATSON-Net produces ten independent models, each evaluated on dedicated validation and test sets. The ten models are calibrated and prepared to be extensible for TESS data by standardizing the input pipeline, allowing for performance assessment across different space missions. Results. For Kepler targets, WATSON-Net achieves a recall-at-precision of 0.99 (R@P0.99) of 0.903, ranking second, with only the ExoMiner network performing better (R@P0.99 = 0.936). For TESS signals, WATSON-Net emerges as the best-performing non-fine-tuned machine learning classifier, achieving a precision of 0.93 and a recall of 0.76 on a test set comprising confirmed planets and false positives. Both the model and its data preparation tools are publicly available in the dearwatson Python package, fully open-source and integrated into the vetting engine of the SHERLOCK pipeline.

Paper Structure

This paper contains 29 sections, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Neural network global architecture.
  • Figure 2: Number of signals matching each of the label criteria. Each NTP signal can potentially match different labels. Dashed vertical lines split three different regions: the left contains the NTP-labeled samples, the middle contains the signals considered as candidates, and the right contains signals labeled as TP (see text for details).
  • Figure 3: Isotonic model distribution of the validation set predictions for positive (blue) and negative (red) samples. The top light green area matches the validation threshold, and the cyan area matches the likely planet threshold. The gray area matches the likely negative threshold, and the red area matches the validated negative threshold. The top panel displays the entire combined validation dataset, while the bottom panel plots only the FPs above the LP threshold and the FNs below the LN threshold. The horizontal dotted lines at 0.95 and 0.01 represent the change of scale for the top and bottom of the plot, respectively.
  • Figure 4: Distribution of predictions for positive (blue) samples for the 2021_gpc, exominer2022, and exominer2023 test sets. On the one hand, the light-green area corresponds to the validation threshold, and the cyan area corresponds to the likely planet threshold. On the other hand, the gray area corresponds to the likely negative threshold, and the red area to the validated negative threshold. The top panel shows the distribution of predictions for the 2021_gpc test set, and the bottom panel shows the distribution of predictions for the exominer2022 and exominer2023 combined test set. The horizontal dotted lines at 0.95 and 0.01 represent the change of scale for the top and bottom of the plot, respectively.
  • Figure 5: Distribution of predictions for positive (blue) and negative (red) samples. The light-green area corresponds to the validation threshold, and the cyan area corresponds to the likely planet threshold. The gray area corresponds to the likely negative threshold, and the red area to the validated negative threshold. The top panel shows the distribution of all the predictions for the TESS test set. The bottom panel displays the distribution of incorrect classifications, including false positives (FPs) above the LP threshold and false negatives (FNs) below the LN threshold. The horizontal dotted lines at 0.95 and 0.01 represent the change of scale for the top and bottom of the plot, respectively.
  • ...and 5 more figures