Table of Contents
Fetching ...

NotPlaNET: Removing False Positives from Planet Hunters TESS with Machine Learning

Valentina Tardugno Poleo, Nora Eisner, David W. Hogg

TL;DR

NotPlaNET tackles the bottleneck of separating real transit events from false positives in TESS data by deploying a six-block 1D CNN that processes non-phase-folded light curves with background flux and centroid information. Trained on Planet Hunters citizen-science labels, the model classifies chunks into likely planets, eclipsing binaries, or other signals and uses a conservative PC-score threshold to flag contaminants while preserving planets. It achieves a median 18% contaminant identification across 18 test sectors, with no planets discarded in 16 sectors; misclassifications occur in sectors associated with short-period or low-SNR TOIs, and contaminants are EB-dominated. The approach substantially reduces manual vetting burden and can be adapted to earlier stages of vetting or other science goals, highlighting the potential for scalable, real-world data-driven contaminant screening in large photometric surveys. NotPlaNET thereby offers a practical path toward more efficient discovery of long-period exoplanets using single-transit events.

Abstract

Differentiating between real transit events and false positive signals in photometric time series data is a bottleneck in the identification of transiting exoplanets, particularly long-period planets. This differentiation typically requires visual inspection of a large number of transit-like signals to rule out instrumental and astrophysical false positives that mimic planetary transit signals. We build a one-dimensional convolutional neural network (CNN) to separate eclipsing binaries and other false positives from potential planet candidates, reducing the number of light curves that require human vetting. Our CNN is trained using the TESS light curves that were identified by Planet Hunters citizen scientists as likely containing a transit. We also include the background flux and centroid information. The light curves are visually inspected and labeled by project scientists and are minimally pre-processed, with only normalization and data augmentation taking place before training. The median percentage of contaminants flagged across the test sectors is 18% with a maximum of 37% and a minimum of 10%. Our model keeps 100% of the planets for 16 of the 18 test sectors, while incorrectly flagging one planet candidate (0.3%) for one sector and two (0.6%) for the remaining sector. Our method shows potential to reduce the number of light curves requiring manual vetting by up to a third with minimal misclassification of planet candidates.

NotPlaNET: Removing False Positives from Planet Hunters TESS with Machine Learning

TL;DR

NotPlaNET tackles the bottleneck of separating real transit events from false positives in TESS data by deploying a six-block 1D CNN that processes non-phase-folded light curves with background flux and centroid information. Trained on Planet Hunters citizen-science labels, the model classifies chunks into likely planets, eclipsing binaries, or other signals and uses a conservative PC-score threshold to flag contaminants while preserving planets. It achieves a median 18% contaminant identification across 18 test sectors, with no planets discarded in 16 sectors; misclassifications occur in sectors associated with short-period or low-SNR TOIs, and contaminants are EB-dominated. The approach substantially reduces manual vetting burden and can be adapted to earlier stages of vetting or other science goals, highlighting the potential for scalable, real-world data-driven contaminant screening in large photometric surveys. NotPlaNET thereby offers a practical path toward more efficient discovery of long-period exoplanets using single-transit events.

Abstract

Differentiating between real transit events and false positive signals in photometric time series data is a bottleneck in the identification of transiting exoplanets, particularly long-period planets. This differentiation typically requires visual inspection of a large number of transit-like signals to rule out instrumental and astrophysical false positives that mimic planetary transit signals. We build a one-dimensional convolutional neural network (CNN) to separate eclipsing binaries and other false positives from potential planet candidates, reducing the number of light curves that require human vetting. Our CNN is trained using the TESS light curves that were identified by Planet Hunters citizen scientists as likely containing a transit. We also include the background flux and centroid information. The light curves are visually inspected and labeled by project scientists and are minimally pre-processed, with only normalization and data augmentation taking place before training. The median percentage of contaminants flagged across the test sectors is 18% with a maximum of 37% and a minimum of 10%. Our model keeps 100% of the planets for 16 of the 18 test sectors, while incorrectly flagging one planet candidate (0.3%) for one sector and two (0.6%) for the remaining sector. Our method shows potential to reduce the number of light curves requiring manual vetting by up to a third with minimal misclassification of planet candidates.
Paper Structure (11 sections, 1 equation, 3 figures, 2 tables)

This paper contains 11 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Convolutional block diagram. The block consists of two 1D convolutional layers with the ReLU activation function, layer normalization, and max pooling.
  • Figure 2: Fraction of planet candidates (top) and true contaminants (bottom) flagged as contaminants as a function of the PC score threshold for the seven validation sectors. For each validation sector, we chose a threshold that maximized the fraction of discarded contaminants while discarding zero planets. We then chose our final threshold (dashed line) by performing a weighted average of each validation sector's threshold. The weights were assigned as the inverse square of the number of contaminants found in a given sector, allowing for a more conservative cutoff.
  • Figure 3: Selected light curve chunks from sector 50. The rightmost column (purple) contains the light curves flagged as contaminants by our model. The leftmost and middle columns (green) contain light curves classified as 'keep for further vetting.' The left column shows true planet candidates, while the middle one shows true contaminants that were incorrectly classified as 'keep for further vetting.' The y-axis range is displayed in the lower-left corner of each panel, while the lower-right corner displays the TIC ID of the light curve.