New Metrics for Identifying Variables and Transients in Large Astronomical Surveys
Shih Ching Fu, Arash Bahramian, Aloke Phatak, James C. A. Miller-Jones, Suman Rakshit, Alexander Andersson, Robert Fender, Patrick A. Woudt
TL;DR
This paper tackles the challenge of identifying variable and transient sources in large astronomical surveys by proposing a Gaussian process (GP) regression framework that avoids assuming light-curve shapes. It models light curves with a three-term kernel (squared exponential, Matern 3/2, and periodic) and uses the seven GP hyperparameters, especially the amplitude terms $\sigma_{SE}$ and $\sigma_{M32}$, as direct descriptors for variability. Compared with traditional metrics $\eta_\nu$ and $V_\nu$, the GP-based amplitude space provides superior discrimination and interpretable clustering of light curves, validated on 6394 ThunderKAT radio light curves and citizen-science labels. The approach also demonstrates a practical, GP-driven screening workflow for transient candidates and includes Python/R notebooks to facilitate deployment across surveys. Overall, the work presents a generalizable, principled method for scalable variability characterisation in time-domain astronomy with strong potential for improving transient discovery pipelines.
Abstract
A key science goal of large sky surveys such as those conducted by the Vera C. Rubin Observatory and precursors to the Square Kilometre Array is the identification of variable and transient objects. One approach is the statistical analysis of the time series of the changing brightness of sources, that is, their light curves. However, finding adequate statistical representations of light curves is challenging because of data quality issues such as sparsity of observations, irregular sampling, and other nuisance factors inherent in astronomical data collection. The wide diversity of objects that a large-scale survey will observe also means that making parametric assumptions about the shape of light curves is problematic. We present a Gaussian process (GP) regression approach for characterising light curve variability that addresses these challenges. Our approach makes no assumptions about the shape of a light curve and, therefore, is general enough to detect a range of variable source types. In particular, we propose using the joint distribution of GP amplitude hyperparameters to distinguish variable and transient candidates from nominally stable ones and apply this approach to 6394 radio light curves from the ThunderKAT survey. We compare our results with two variability metrics commonly used in radio astronomy, namely $η_ν$ and $V_ν$, and show that our approach has better discriminatory power and interpretability. Finally, we conduct a rudimentary search for transient sources in the ThunderKAT dataset to demonstrate how our approach might be used as an initial screening tool. Computational notebooks in Python and R are available to help facilitate the deployment of this framework to other surveys.
