Machine Learning for Radial Velocity Analysis I: Vision Transformers as a Robust Alternative for Detecting Planetary Candidates
Anoop Gavankar, Tanish Mittal, Joe Ninan, Shravan Hanasoge
TL;DR
This work demonstrates that Vision Transformer–based models can robustly detect low-amplitude planetary signals in radial velocity time series affected by stellar activity, using NEID solar data with injected Keplerian motions. By converting spectra into concatenated cross-correlation function (CCCF) inputs and employing a two-stage training regime (shuffle-based pretraining followed by cadence-aware finetuning), the approach surpasses Lomb-Scargle in low-SNR, long-period regimes and provides probabilistic period/amplitude classifications. The method leverages full spectral information, handles irregular sampling effectively, and shows promise for early candidate identification and follow-up prioritization in EPRV surveys, though it faces challenges with real No-Planet cases and rotation-blindness at shorter periods. The results motivate further development toward transfer to other stars, integration of multi-CCF inputs, and binary classification of strong planetary signals, aiming to consolidate ViPer-RV as a community-grade tool for RV exoplanet searches.
Abstract
Extreme precision radial velocity (EPRV) surveys usually require extensive observational baselines to confirm planetary candidates, making them resource-intensive. Traditionally, periodograms are used to identify promising candidate signals before further observational investment, but their effectiveness is often limited for low-amplitude signals due to stellar jitter. In this work, we develop a machine learning (ML) framework based on a Transformer architecture that aims to detect the presence and likely period of planetary signals in time-series spectra, even in the presence of stellar activity. The model is trained to classify whether a planetary signal exists and assign it to one of several discrete period and amplitude bins. Injection-recovery tests on randomly selected 100 epoch observation subsets from NEID solar data (2020-2022 period) show that for low-amplitude systems ($<$1 ms$^{-1}$), our model improves planetary candidate identification by a factor of two compared to the traditional Lomb-Scargle periodogram. Our ML model is built on a Vision Transformer (ViT) architecture that processes reduced representations of solar spectrum observations to predict the period and semi-amplitude of planetary signal candidates. By analyzing multi-epoch spectra, the model reliably detects planetary signals with semi-amplitudes as low as 65 cms$^{-1}$. Even under real solar noise and irregular sampling, it identifies signals down to 35 cms$^{-1}$. Comparisons with the Lomb-Scargle periodogram demonstrate a significant improvement in detecting low-amplitude planetary candidates, particularly for longer orbital periods. These results underscore the potential of machine learning to identify planetary candidates early in EPRV surveys, even from limited observational counts.
