Table of Contents
Fetching ...

The GPU Phase Folding and Deep Learning Method for Detecting Exoplanet Transits

Kaitlyn Wang, Jian Ge, Kevin Willis, Kevin Wang, Yinan Zhao

TL;DR

The paper addresses the challenge of detecting shallow exoplanet transits in large photometric datasets, especially ultra-short-period planets. It introduces GPFC, a pipeline that fuses fast GPU-based phase folding with a convolutional neural network to assign transit-likelihood scores across a dense trial-period grid. The approach yields three orders of magnitude speedup over traditional BLS while improving detection metrics at low SNRs, and it successfully recovers all confirmed Kepler USP planets in a blind test. Real Kepler data validation shows GPFC assigns high scores to known USPs and demonstrates potential for discovering new exoplanets in Kepler and future survey data, with broad applicability to missions like K2, TESS, PLATO, and Earth 2.0.

Abstract

This paper presents GPFC, a novel Graphics Processing Unit (GPU) Phase Folding and Convolutional Neural Network (CNN) system to detect exoplanets using the transit method. We devise a fast folding algorithm parallelized on a GPU to amplify low signal-to-noise ratio transit signals, allowing a search at high precision and speed. A CNN trained on two million synthetic light curves reports a score indicating the likelihood of a planetary signal at each period. While the GPFC method has broad applicability across period ranges, this research specifically focuses on detecting ultra-short-period planets with orbital periods less than one day. GPFC improves on speed by three orders of magnitude over the predominant Box-fitting Least Squares (BLS) method. Our simulation results show GPFC achieves $97%$ training accuracy, higher true positive rate at the same false positive rate of detection, and higher precision at the same recall rate when compared to BLS. GPFC recovers $100\%$ of known ultra-short-period planets in $\textit{Kepler}$ light curves from a blind search. These results highlight the promise of GPFC as an alternative approach to the traditional BLS algorithm for finding new transiting exoplanets in data taken with $\textit{Kepler}$ and other space transit missions such as K2, TESS and future PLATO and Earth 2.0.

The GPU Phase Folding and Deep Learning Method for Detecting Exoplanet Transits

TL;DR

The paper addresses the challenge of detecting shallow exoplanet transits in large photometric datasets, especially ultra-short-period planets. It introduces GPFC, a pipeline that fuses fast GPU-based phase folding with a convolutional neural network to assign transit-likelihood scores across a dense trial-period grid. The approach yields three orders of magnitude speedup over traditional BLS while improving detection metrics at low SNRs, and it successfully recovers all confirmed Kepler USP planets in a blind test. Real Kepler data validation shows GPFC assigns high scores to known USPs and demonstrates potential for discovering new exoplanets in Kepler and future survey data, with broad applicability to missions like K2, TESS, PLATO, and Earth 2.0.

Abstract

This paper presents GPFC, a novel Graphics Processing Unit (GPU) Phase Folding and Convolutional Neural Network (CNN) system to detect exoplanets using the transit method. We devise a fast folding algorithm parallelized on a GPU to amplify low signal-to-noise ratio transit signals, allowing a search at high precision and speed. A CNN trained on two million synthetic light curves reports a score indicating the likelihood of a planetary signal at each period. While the GPFC method has broad applicability across period ranges, this research specifically focuses on detecting ultra-short-period planets with orbital periods less than one day. GPFC improves on speed by three orders of magnitude over the predominant Box-fitting Least Squares (BLS) method. Our simulation results show GPFC achieves training accuracy, higher true positive rate at the same false positive rate of detection, and higher precision at the same recall rate when compared to BLS. GPFC recovers of known ultra-short-period planets in light curves from a blind search. These results highlight the promise of GPFC as an alternative approach to the traditional BLS algorithm for finding new transiting exoplanets in data taken with and other space transit missions such as K2, TESS and future PLATO and Earth 2.0.
Paper Structure (25 sections, 3 equations, 19 figures, 2 tables)

This paper contains 25 sections, 3 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Fast GPU Phase Folding and CNN (GPFC) Processing Pipeline. The GPFC approach initiates by ingesting a raw light curve and subjects it to detrending. Following this, the light curve is phase folded using a high-precision grid of trial periods. Then the folded results are noise normalized and fed into the CNN, which produces a probability score indicative of the likelihood that the light curve contains a transit event.
  • Figure 2: Example outputs of the GPFC and BLS methods on a simulated light curve. The light curve phase folded at the instrumented transit period is shown in the top panel. The score vs. trial period is illustrated for both GPFC (the middle panel) and BLS (the bottom panel), showing peak scores at the correct period.
  • Figure 3: The Data Preprocessing Module: (A) a raw Kepler light curve before preprocessing, (B) preprocessing steps conducted: masking of known transits, segmenting the light curve to multiple data sections, and cropping edges, (C) splitting by flux gap criteria and continuum fitting in each segmented section, (D) the final normalized light curve after preprocessing, where data points in transit windows are plotted in green.
  • Figure 4: The GPU Phase Folding Module. The GPU phase folding algorithm consists of 5 steps, each of which is optimized through parallel computing. First, (a) the timestamps of the light curve data points are modded by the trial period. Then, (b) the full time span of the trial period is evenly split into 4096 equally-spaced bins, and the data points are mapped into the bins based on the value of their time residuals from (a). Next, (c) the 4096 bins are rebinned into 256 bins, such that each of the 256 bins contains an equal number of data points. The flux values of the data points in each bin are averaged in (d) and filled into its final form, 100,000x256 data points, in (e). The initial mapping to the 4096 bins in (b) serves as a near-perfect emulation of sorting, and the subsequent mapping to the 256 bins and averaging in (c) and (d) create homogeneous noises in the binned data to minimize false signals and achieve noise reduction. The algorithm takes advantage of the GPU's blocks and threads structure such that multiple ($p$) trial periods are folded at the same time. ($p$ was 16 with the GPU we used but it can be more with an advanced GPU with additional memory and thread parallelism.) In other words, for each of the steps from (a) to (e), there are $p$ of that process executing at the same time.
  • Figure 5: The CNN Module. As input, the CNN takes a single noise normalized 256-length light curve fold and the CNN outputs a confidence score that the folded input contains a transit signal.
  • ...and 14 more figures