Table of Contents
Fetching ...

New Metrics for Identifying Variables and Transients in Large Astronomical Surveys

Shih Ching Fu, Arash Bahramian, Aloke Phatak, James C. A. Miller-Jones, Suman Rakshit, Alexander Andersson, Robert Fender, Patrick A. Woudt

TL;DR

This paper tackles the challenge of identifying variable and transient sources in large astronomical surveys by proposing a Gaussian process (GP) regression framework that avoids assuming light-curve shapes. It models light curves with a three-term kernel (squared exponential, Matern 3/2, and periodic) and uses the seven GP hyperparameters, especially the amplitude terms $\sigma_{SE}$ and $\sigma_{M32}$, as direct descriptors for variability. Compared with traditional metrics $\eta_\nu$ and $V_\nu$, the GP-based amplitude space provides superior discrimination and interpretable clustering of light curves, validated on 6394 ThunderKAT radio light curves and citizen-science labels. The approach also demonstrates a practical, GP-driven screening workflow for transient candidates and includes Python/R notebooks to facilitate deployment across surveys. Overall, the work presents a generalizable, principled method for scalable variability characterisation in time-domain astronomy with strong potential for improving transient discovery pipelines.

Abstract

A key science goal of large sky surveys such as those conducted by the Vera C. Rubin Observatory and precursors to the Square Kilometre Array is the identification of variable and transient objects. One approach is the statistical analysis of the time series of the changing brightness of sources, that is, their light curves. However, finding adequate statistical representations of light curves is challenging because of data quality issues such as sparsity of observations, irregular sampling, and other nuisance factors inherent in astronomical data collection. The wide diversity of objects that a large-scale survey will observe also means that making parametric assumptions about the shape of light curves is problematic. We present a Gaussian process (GP) regression approach for characterising light curve variability that addresses these challenges. Our approach makes no assumptions about the shape of a light curve and, therefore, is general enough to detect a range of variable source types. In particular, we propose using the joint distribution of GP amplitude hyperparameters to distinguish variable and transient candidates from nominally stable ones and apply this approach to 6394 radio light curves from the ThunderKAT survey. We compare our results with two variability metrics commonly used in radio astronomy, namely $η_ν$ and $V_ν$, and show that our approach has better discriminatory power and interpretability. Finally, we conduct a rudimentary search for transient sources in the ThunderKAT dataset to demonstrate how our approach might be used as an initial screening tool. Computational notebooks in Python and R are available to help facilitate the deployment of this framework to other surveys.

New Metrics for Identifying Variables and Transients in Large Astronomical Surveys

TL;DR

This paper tackles the challenge of identifying variable and transient sources in large astronomical surveys by proposing a Gaussian process (GP) regression framework that avoids assuming light-curve shapes. It models light curves with a three-term kernel (squared exponential, Matern 3/2, and periodic) and uses the seven GP hyperparameters, especially the amplitude terms and , as direct descriptors for variability. Compared with traditional metrics and , the GP-based amplitude space provides superior discrimination and interpretable clustering of light curves, validated on 6394 ThunderKAT radio light curves and citizen-science labels. The approach also demonstrates a practical, GP-driven screening workflow for transient candidates and includes Python/R notebooks to facilitate deployment across surveys. Overall, the work presents a generalizable, principled method for scalable variability characterisation in time-domain astronomy with strong potential for improving transient discovery pipelines.

Abstract

A key science goal of large sky surveys such as those conducted by the Vera C. Rubin Observatory and precursors to the Square Kilometre Array is the identification of variable and transient objects. One approach is the statistical analysis of the time series of the changing brightness of sources, that is, their light curves. However, finding adequate statistical representations of light curves is challenging because of data quality issues such as sparsity of observations, irregular sampling, and other nuisance factors inherent in astronomical data collection. The wide diversity of objects that a large-scale survey will observe also means that making parametric assumptions about the shape of light curves is problematic. We present a Gaussian process (GP) regression approach for characterising light curve variability that addresses these challenges. Our approach makes no assumptions about the shape of a light curve and, therefore, is general enough to detect a range of variable source types. In particular, we propose using the joint distribution of GP amplitude hyperparameters to distinguish variable and transient candidates from nominally stable ones and apply this approach to 6394 radio light curves from the ThunderKAT survey. We compare our results with two variability metrics commonly used in radio astronomy, namely and , and show that our approach has better discriminatory power and interpretability. Finally, we conduct a rudimentary search for transient sources in the ThunderKAT dataset to demonstrate how our approach might be used as an initial screening tool. Computational notebooks in Python and R are available to help facilitate the deployment of this framework to other surveys.

Paper Structure

This paper contains 39 sections, 13 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Projection onto the Galactic plane of the ThunderKAT fields of observation included in this work. The colour bar indicates the number of sources for which we have light curves in each field andersson_bursts_2023. Fields are labelled according to their observation target. Background image from the Gaia mission (A. Moitinho; ESA/Gaia/DPAC).
  • Figure 2: Left: Posterior predictive samples of light curves in the fields around (A) J1848G, (C) 4U1543, (E) J1858, and (G) GRS1915. Shaded regions are 68% and 90% quantile intervals, respectively. Standardised results have been transformed back to their original flux density scale. The inconsistent size of uncertainties is due to poor data quality in certain epochs. Right: Power spectral density (PSD) estimates of the posterior predictive curves in the left panel. Note that in D and H, numerical approximation artefacts misleadingly show the SE kernel exceeding the total PSD. See Table \ref{['tab:post_medians']} for statistics of each source.
  • Figure 3: Posterior medians of our GP regression model's hyperparameters from each kernel term. A: Squared exponential (SE) kernel, B: Matern 3/2 (M32) kernel, C and D: Periodic (P) kernel. Each point corresponds to one light curve. Amplitudes, $\sigma$, are in standardised units, and length scales $\ell$ and period $T$ are in days on a logarithmic scale. Colours indicate the field in which the light curve was observed. Notice the distinct clustering associated with the field of observation.
  • Figure 4: Left: Scatter plot of the posterior medians of the period ($T$) and length scale ($\ell_\textrm{P}$) hyperparameters of the periodic kernel. These two quantities have a strong positive correlation ($R^2 = 0.996$). Right: Scatter plot of the posterior median of $T$ against the total duration of each light curve.
  • Figure 5: Scatter plot of the posterior medians of the amplitude hyperparameters of the squared exponential ($\sigma_\textrm{SE}$) and Matern 3/2 ($\sigma_\textrm{M32}$) kernels for each fitted light curve. Amplitudes are in standardised units, and colours indicate the field where the light curve was observed. The dashed line indicates the line of equality between the two hyperparameters.
  • ...and 8 more figures