Table of Contents
Fetching ...

Segmented-Polynomial-fitting Least Squares (SPLS): An optimized algorithm to find Earth twins

Shuyue Zheng, Fabo Feng, Yicheng Rui

TL;DR

Segmented-Polynomial-fitting Least Squares (SPLS) introduces a joint model for transits and background trends using a segmented double-polynomial framework to enhance detection of weak, long-period Earth-analog transits. The method employs Bayes-factor–driven trend-order selection and a periodogram significance metric (SDE) derived from a log-likelihood difference, with a three-step approximation to accelerate computation. Injection–recovery tests on Kepler data show SPLS achieving higher true-positive rates and lower false positives than standard detrending-detection pipelines, including a 97% recovery rate for Kepler single-planet systems. SPLS thereby improves sensitivity to Earth twins in current and upcoming missions (Kepler, TESS, PLATO, Earth 2.0), albeit with higher computational costs that could be mitigated by GPU implementation and further methodological refinements.

Abstract

Detecting Earth twins remains challenging because their shallow, long-period transits are difficult to distinguish from background noise. Motivated by the challenge, we developed Segmented-Polynomial-fitting Least Squares (SPLS), a new algorithm that simultaneously fits planetary transits and background trends using a segmented double polynomial model. Prior to signal detection, the optimal polynomial order for the trend component is selected using Bayes factor-based model comparison. During the periodogram search, the Signal Detection Efficiency metric is used to assess signal significance. The algorithm is accelerated by a three-step approximation and nonlinear parameter sampling tailored to SPLS. We compare the performance of SPLS with traditional detrending-detection approaches across different orbital periods, signal-to-noise ratios (SNR), planet radii and stellar noise levels on an injection-recovery test. When detecting signals with periods between 10 and 480 days and SNRs below 9, SPLS achieves at least a 22.6% higher true positive rate than other methods at the same 10% false positive rate. Using the threshold determined from the Receiver Operating Characteristic curve analysis, our method also recovers the most true signals while yielding the fewest false positives among all injected samples, and reaches a 97% recovery fraction in Kepler confirmed single-planet systems. The tests demonstrate that SPLS improves the detection of transiting planets, particularly for low-SNR, long-period signals. It offers the potential for finding Earth twins in future applications to data from Kepler, TESS, and upcoming PLATO and Earth 2.0 missions.

Segmented-Polynomial-fitting Least Squares (SPLS): An optimized algorithm to find Earth twins

TL;DR

Segmented-Polynomial-fitting Least Squares (SPLS) introduces a joint model for transits and background trends using a segmented double-polynomial framework to enhance detection of weak, long-period Earth-analog transits. The method employs Bayes-factor–driven trend-order selection and a periodogram significance metric (SDE) derived from a log-likelihood difference, with a three-step approximation to accelerate computation. Injection–recovery tests on Kepler data show SPLS achieving higher true-positive rates and lower false positives than standard detrending-detection pipelines, including a 97% recovery rate for Kepler single-planet systems. SPLS thereby improves sensitivity to Earth twins in current and upcoming missions (Kepler, TESS, PLATO, Earth 2.0), albeit with higher computational costs that could be mitigated by GPU implementation and further methodological refinements.

Abstract

Detecting Earth twins remains challenging because their shallow, long-period transits are difficult to distinguish from background noise. Motivated by the challenge, we developed Segmented-Polynomial-fitting Least Squares (SPLS), a new algorithm that simultaneously fits planetary transits and background trends using a segmented double polynomial model. Prior to signal detection, the optimal polynomial order for the trend component is selected using Bayes factor-based model comparison. During the periodogram search, the Signal Detection Efficiency metric is used to assess signal significance. The algorithm is accelerated by a three-step approximation and nonlinear parameter sampling tailored to SPLS. We compare the performance of SPLS with traditional detrending-detection approaches across different orbital periods, signal-to-noise ratios (SNR), planet radii and stellar noise levels on an injection-recovery test. When detecting signals with periods between 10 and 480 days and SNRs below 9, SPLS achieves at least a 22.6% higher true positive rate than other methods at the same 10% false positive rate. Using the threshold determined from the Receiver Operating Characteristic curve analysis, our method also recovers the most true signals while yielding the fewest false positives among all injected samples, and reaches a 97% recovery fraction in Kepler confirmed single-planet systems. The tests demonstrate that SPLS improves the detection of transiting planets, particularly for low-SNR, long-period signals. It offers the potential for finding Earth twins in future applications to data from Kepler, TESS, and upcoming PLATO and Earth 2.0 missions.

Paper Structure

This paper contains 21 sections, 12 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Illustration of the transit parameters with the Kepler light curve of Kepler-572. The light curve with a 29.4-min cadence in the left panel is after Kepler Pre-search Data Conditioning processing, removing points flagged as non-zero data quality and normalization in the time range of [475, 525] days in Quarter 5. For the best-fit parameters ($P$ = 17.205 d, $d$ = 3.6211 h, $t_{m0}$ = 2455008.058 d; 2016ApJ...822...86M), black and gray points denote the data in and out of the 1.797-day window centered on the mid-transit times, respectively. The data points contained in the first transit window are shown in the right panel. The orange solid line and green dashed line are the fitted results using the periodic model and baseline model with a 4th-order polynomial for transits and a 1st-order polynomial for the trends, respectively.
  • Figure 2: $\Delta\ln\mathcal{L}$ distribution in the linear search and periodic search of the example light curve of Fig. \ref{['fig:example_Kepler572']}. The distribution of the log-likelihood difference in ($d$, $t_{m}$) grids of linear search is plotted in the left panel. The same polynomial order and window size are set as Fig. \ref{['fig:example_Kepler572']}. 15 durations were uniformly sampled in logarithmic space and also 28,708 mid-transit times were uniformly sampled. The number, 28,708, is determined by the set period minimum of 1.797 days (See the principle of parameter setting in Section \ref{['subsub: Sampling of nonlinear parameters']}). The right panel shows the new distribution of $\Delta \ln\mathcal{L}(d,\ t_{m0})$ by reshaping the $\Delta\ln\mathcal{L}(d,\ t_m)$ of the left panel and reducing dimension at the best period (17.205 d).
  • Figure 3: Period resolution ($\delta P$) and the number of sampled mid-transit times as functions of trial periods. The lines are expressed in Eq. (\ref{['eq: dP']}) (left) and Eq. (\ref{['eq: n_tm']}) (right) with four time spans of light curves and all are linear in logarithmic space. The pink circle represents the 10-day minimum period for the full Kepler light curve. The pink triangle represents the default parameter setting for the example light curve of Kepler-572 ($S=48.7\ {\rm d}$).
  • Figure 4: The relation of orbital period and transit duration of confirmed transiting exoplanets from NASA Exoplanet Archive. EPIC 248847494 b with a 3650-day period is not plotted in the figure because of the single transit. Orange region is the sampled range of period and duration for the exampled light curve of Kepler-572 (left; default sampling) and a full long-cadence Kepler light curve observed for 17 quarters with 0.8-day $d_{\rm max}$ (right). The region enclosed by the solid orange line represents the sampling in the linear search, and the shaded area represents the further constrained sampling in the periodic search.
  • Figure 5: Frequency histogram of best trend orders of all segments of the example light curve of Kepler-572. The red line shows the 90th percentile of the distribution whose corresponding trend order, 1-st order, is chosen as the optimal trend order for the target finally.
  • ...and 14 more figures