Table of Contents
Fetching ...

Machine Learning for Radial Velocity Analysis I: Vision Transformers as a Robust Alternative for Detecting Planetary Candidates

Anoop Gavankar, Tanish Mittal, Joe Ninan, Shravan Hanasoge

TL;DR

This work demonstrates that Vision Transformer–based models can robustly detect low-amplitude planetary signals in radial velocity time series affected by stellar activity, using NEID solar data with injected Keplerian motions. By converting spectra into concatenated cross-correlation function (CCCF) inputs and employing a two-stage training regime (shuffle-based pretraining followed by cadence-aware finetuning), the approach surpasses Lomb-Scargle in low-SNR, long-period regimes and provides probabilistic period/amplitude classifications. The method leverages full spectral information, handles irregular sampling effectively, and shows promise for early candidate identification and follow-up prioritization in EPRV surveys, though it faces challenges with real No-Planet cases and rotation-blindness at shorter periods. The results motivate further development toward transfer to other stars, integration of multi-CCF inputs, and binary classification of strong planetary signals, aiming to consolidate ViPer-RV as a community-grade tool for RV exoplanet searches.

Abstract

Extreme precision radial velocity (EPRV) surveys usually require extensive observational baselines to confirm planetary candidates, making them resource-intensive. Traditionally, periodograms are used to identify promising candidate signals before further observational investment, but their effectiveness is often limited for low-amplitude signals due to stellar jitter. In this work, we develop a machine learning (ML) framework based on a Transformer architecture that aims to detect the presence and likely period of planetary signals in time-series spectra, even in the presence of stellar activity. The model is trained to classify whether a planetary signal exists and assign it to one of several discrete period and amplitude bins. Injection-recovery tests on randomly selected 100 epoch observation subsets from NEID solar data (2020-2022 period) show that for low-amplitude systems ($<$1 ms$^{-1}$), our model improves planetary candidate identification by a factor of two compared to the traditional Lomb-Scargle periodogram. Our ML model is built on a Vision Transformer (ViT) architecture that processes reduced representations of solar spectrum observations to predict the period and semi-amplitude of planetary signal candidates. By analyzing multi-epoch spectra, the model reliably detects planetary signals with semi-amplitudes as low as 65 cms$^{-1}$. Even under real solar noise and irregular sampling, it identifies signals down to 35 cms$^{-1}$. Comparisons with the Lomb-Scargle periodogram demonstrate a significant improvement in detecting low-amplitude planetary candidates, particularly for longer orbital periods. These results underscore the potential of machine learning to identify planetary candidates early in EPRV surveys, even from limited observational counts.

Machine Learning for Radial Velocity Analysis I: Vision Transformers as a Robust Alternative for Detecting Planetary Candidates

TL;DR

This work demonstrates that Vision Transformer–based models can robustly detect low-amplitude planetary signals in radial velocity time series affected by stellar activity, using NEID solar data with injected Keplerian motions. By converting spectra into concatenated cross-correlation function (CCCF) inputs and employing a two-stage training regime (shuffle-based pretraining followed by cadence-aware finetuning), the approach surpasses Lomb-Scargle in low-SNR, long-period regimes and provides probabilistic period/amplitude classifications. The method leverages full spectral information, handles irregular sampling effectively, and shows promise for early candidate identification and follow-up prioritization in EPRV surveys, though it faces challenges with real No-Planet cases and rotation-blindness at shorter periods. The results motivate further development toward transfer to other stars, integration of multi-CCF inputs, and binary classification of strong planetary signals, aiming to consolidate ViPer-RV as a community-grade tool for RV exoplanet searches.

Abstract

Extreme precision radial velocity (EPRV) surveys usually require extensive observational baselines to confirm planetary candidates, making them resource-intensive. Traditionally, periodograms are used to identify promising candidate signals before further observational investment, but their effectiveness is often limited for low-amplitude signals due to stellar jitter. In this work, we develop a machine learning (ML) framework based on a Transformer architecture that aims to detect the presence and likely period of planetary signals in time-series spectra, even in the presence of stellar activity. The model is trained to classify whether a planetary signal exists and assign it to one of several discrete period and amplitude bins. Injection-recovery tests on randomly selected 100 epoch observation subsets from NEID solar data (2020-2022 period) show that for low-amplitude systems (1 ms), our model improves planetary candidate identification by a factor of two compared to the traditional Lomb-Scargle periodogram. Our ML model is built on a Vision Transformer (ViT) architecture that processes reduced representations of solar spectrum observations to predict the period and semi-amplitude of planetary signal candidates. By analyzing multi-epoch spectra, the model reliably detects planetary signals with semi-amplitudes as low as 65 cms. Even under real solar noise and irregular sampling, it identifies signals down to 35 cms. Comparisons with the Lomb-Scargle periodogram demonstrate a significant improvement in detecting low-amplitude planetary candidates, particularly for longer orbital periods. These results underscore the potential of machine learning to identify planetary candidates early in EPRV surveys, even from limited observational counts.

Paper Structure

This paper contains 47 sections, 31 figures, 2 tables.

Figures (31)

  • Figure 1: (a) Figure a shows the irradiance profile for a typical clear-sky day, showing smooth temporal variation with sharp transitions at dawn and dusk. The sudden flux drop at dusk is due to the shadow of the telescope building. (b) Figure b shows the irradiance profile for a cloudy day, exhibiting pronounced fluctuations in solar radiation due to varying atmospheric conditions.
  • Figure 2: (a) Figure a shows the irradiance profile for a clear day compared with its monthly template from November, showing similar characteristics. (b) Figure b shows a histogram of the rolling standard deviation for 30,000 randomly selected FITS file windows, displaying a pseudo-Gaussian distribution with a pronounced long tail. The chosen clear-sky day cutoff at 3 $Wm^{-2}$ is marked by the vertical dashed line.
  • Figure 3: (a) Figure a shows the rolling standard deviation of the irradiance profile for a clear-sky day, showing consistently low values with a spike at dusk due to the sharp decline in irradiance. (b) Figure b shows the rolling standard deviation of the irradiance profile for a cloudy day, where higher values indicate significant fluctuations in solar irradiance.
  • Figure 4: This figure illustrates the Gaussian profile fit to a spectral line, as discussed in Section \ref{['subsec:3.3']}.
  • Figure 5: This figure shows a Sample 1D-CCCF vector (see Section \ref{['subsec:3.3']}) constructed by concatenating 10 CCFs, with activity-sensitive spectral lines as listed in Table \ref{['tab:1']} stitched together. Activity lines are normalized following the NEID DRP approach \ref{['neid:Activity']}. Each spectrum is represented in this compact form to balance vector size optimization against information loss from averaging.
  • ...and 26 more figures