StarEmbed: Benchmarking Time Series Foundation Models on Astronomical Observations of Variable Stars
Weijian Li, Hong-Yu Chen, Qinjie Lin, Nabeel Rehemtulla, Ved G. Shah, Dennis Wu, Adam A. Miller, Han Liu
TL;DR
Time-domain astronomy faces a data deluge of irregular, multivariate light curves that challenge traditional pipelines. StarEmbed provides the first public benchmark to evaluate time-series foundation models on ZTF light curves across seven classes, focusing on unsupervised clustering, supervised classification, and out-of-distribution detection under zero-shot transfer. Chronos-based TSFMs show strong generalization to astronomical data and achieve state-of-the-art performance on OOD detection, while hand-crafted features remain highly competitive for clustering and classification; domain-specific Astromer models give limited zero-shot gains. The study advocates a paradigm shift toward generic foundation representations for petascale time-series analysis in upcoming surveys like LSST and publishes embeddings, datasets, and code to enable community-driven progress.
Abstract
Time series foundation models (TSFMs) are increasingly being adopted as highly-capable general-purpose time series representation learners. Although their training corpora are vast, they exclude astronomical time series data. Observations of stars produce peta-scale time series with unique challenges including irregular sampling and heteroskedasticity. We introduce StarEmbed, the first public benchmark for rigorous and standardized evaluation of state-of-the-art TSFMs on stellar time series observations (``light curves''). We benchmark on three scientifically-motivated downstream tasks: unsupervised clustering, supervised classification, and out-of-distribution source detection. StarEmbed integrates a catalog of expert-vetted labels with multi-variate light curves from the Zwicky Transient Facility, yielding ~40k hand-labeled light curves spread across seven astrophysical classes. We evaluate the zero-shot representation capabilities of three TSFMs (MOIRAI, Chronos, Chronos-Bolt) and a domain-specific transformer (Astromer) against handcrafted feature extraction, the long-standing baseline in the astrophysics literature. Our results demonstrate that these TSFMs, especially the Chronos models, which are trained on data completely unlike the astronomical observations, can outperform established astrophysics-specific baselines in some tasks and effectively generalize to entirely new data. In particular, TSFMs deliver state-of-the-art performance on our out-of-distribution source detection benchmark. With the first benchmark of TSFMs on astronomical time series data, we test the limits of their generalization and motivate a paradigm shift in time-domain astronomy from using task-specific, fully supervised pipelines toward adopting generic foundation model representations for the analysis of peta-scale datasets from forthcoming observatories.
