Table of Contents
Fetching ...

Anomaly Hunter for Alerts (AHA): Anomaly Detection in the ZTF Transient Alert Stream

Leyla Iskandarli, Chris J. Lintott, Steve Croft, Heloise Stevance, Joshua Weston

TL;DR

The paper tackles the challenge of filtering vast time-domain alert streams by introducing Anomaly Hunter for Alerts (AHA), an unsupervised, multi-modality anomaly-detection pipeline for ZTF alerts using the Lasair broker. It trains three independent autoencoders on object features, triplet image cutouts, and light curves to identify exotic transients and anomalous supernovae with high recall and purity, evaluated on held-out data and a live stream. Across test and live settings, the ensemble recovers distinct, complementary anomaly populations, yielding 87 rank-3 candidates in 25 days and demonstrating data-efficient operation with only a few thousand training examples. The framework is transferable to upcoming Rubin data, offering a scalable, real-time discovery space for rare transients through modality-specific anomaly detectors and targeted follow-up.

Abstract

Modern time-domain surveys produce alert streams at a scale that makes exhaustive manual inspection infeasible, requiring automated methods to identify unusual transients for follow-up. In this work, we present an unsupervised anomaly detection pipeline applied to the ZTF alert stream using the Lasair broker. We define normal objects as SN Ia, SN II, and SN Ib/c. Anomalous objects include (i) more exotic transients (AGN, TDEs, SLSNe, CVs, and nuclear transients) and (ii) supernova-labeled objects, either spectroscopically or by Lasair, with anomalous properties, such as incorrect or absent host associations, or non-supernova-like light curves. Our pipeline consists of three independently trained simple autoencoders operating on distinct alert stream data products: object features, triplet image cutouts, and light curves. Each model is trained on predominantly normal transients, and performance is assessed using the recall of exotic objects and the purity of all anomalous objects across both a spectroscopically classified held-out test set and the live alert stream. In the test set, performance is evaluated at a fixed rank corresponding to the top ten scoring candidates, while in the alert stream it is evaluated using an anomaly threshold defined from test set behavior. Across both settings, the algorithms consistently recover exotic transients and anomalous supernovae among their top-ranked candidates. Over 25 days of live alert stream application, we identify 87 unusual supernova candidates for follow-up. The overlap between anomalies flagged by different autoencoders in the test set is non-existent, and in the alert stream is small, with maximum overlap between any two algorithms being 11 objects. The framework is data-efficient, requiring only a few thousand training examples, making it well suited for early and ongoing application to the Rubin Observatory alert stream.

Anomaly Hunter for Alerts (AHA): Anomaly Detection in the ZTF Transient Alert Stream

TL;DR

The paper tackles the challenge of filtering vast time-domain alert streams by introducing Anomaly Hunter for Alerts (AHA), an unsupervised, multi-modality anomaly-detection pipeline for ZTF alerts using the Lasair broker. It trains three independent autoencoders on object features, triplet image cutouts, and light curves to identify exotic transients and anomalous supernovae with high recall and purity, evaluated on held-out data and a live stream. Across test and live settings, the ensemble recovers distinct, complementary anomaly populations, yielding 87 rank-3 candidates in 25 days and demonstrating data-efficient operation with only a few thousand training examples. The framework is transferable to upcoming Rubin data, offering a scalable, real-time discovery space for rare transients through modality-specific anomaly detectors and targeted follow-up.

Abstract

Modern time-domain surveys produce alert streams at a scale that makes exhaustive manual inspection infeasible, requiring automated methods to identify unusual transients for follow-up. In this work, we present an unsupervised anomaly detection pipeline applied to the ZTF alert stream using the Lasair broker. We define normal objects as SN Ia, SN II, and SN Ib/c. Anomalous objects include (i) more exotic transients (AGN, TDEs, SLSNe, CVs, and nuclear transients) and (ii) supernova-labeled objects, either spectroscopically or by Lasair, with anomalous properties, such as incorrect or absent host associations, or non-supernova-like light curves. Our pipeline consists of three independently trained simple autoencoders operating on distinct alert stream data products: object features, triplet image cutouts, and light curves. Each model is trained on predominantly normal transients, and performance is assessed using the recall of exotic objects and the purity of all anomalous objects across both a spectroscopically classified held-out test set and the live alert stream. In the test set, performance is evaluated at a fixed rank corresponding to the top ten scoring candidates, while in the alert stream it is evaluated using an anomaly threshold defined from test set behavior. Across both settings, the algorithms consistently recover exotic transients and anomalous supernovae among their top-ranked candidates. Over 25 days of live alert stream application, we identify 87 unusual supernova candidates for follow-up. The overlap between anomalies flagged by different autoencoders in the test set is non-existent, and in the alert stream is small, with maximum overlap between any two algorithms being 11 objects. The framework is data-efficient, requiring only a few thousand training examples, making it well suited for early and ongoing application to the Rubin Observatory alert stream.
Paper Structure (24 sections, 3 equations, 5 figures)

This paper contains 24 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: Latent space projections of the autoencoder bottleneck dimensions. Axes are arbitrary latent dimensions. Normal objects are shown in blue and exotic objects are shown in orange. Exotic objects occupy overlapping regions of latent space and form localized substructure within the broader distribution rather than a clearly separated population.
  • Figure 2: Three-dimensional principal component projections of the full dataset. Blue points represent the full object population, while red points indicate the top 1% of objects ranked by autoencoder reconstruction error. Each panel shows the same data from a different viewing angle to illustrate the spatial distribution of high-reconstruction-error objects relative to the PCA manifold.
  • Figure 3: Input triplet images (science, template, difference) and corresponding autoencoder reconstructions for well-reconstructed objects, illustrating the model’s ability to reproduce typical alert cutout morphology.
  • Figure 4: Example light curves from the training set used by the light curve autoencoder, shown after model training. All panels display normal SN Ia objects. Green points and curves correspond to the $g$ band, and purple points and curves to the $r$ band. Discrete points with error bars show the raw ZTF difference-flux photometry retrieved from Lasair. The dashed, semi-transparent curves indicate the kernel-regressed light curves resampled onto a fixed temporal grid, which form the inputs to the autoencoder. Solid curves show the corresponding autoencoder reconstructions.
  • Figure 5: Agreement between anomaly detection algorithms for supernova-labeled objects. Each panel shows a $2\times2$ matrix comparing follow-up decisions for objects classified as supernovae by sherlock and manually inspected by us. For each pair of autoencoder algorithms, the axes indicate whether an object was considered interesting enough for follow-up (rank 3 or rank 2) or not interesting by each method. The numbers in each cell give the count of objects in each category. Objects in the lower-right cells are identified as interesting by both algorithms, while off-diagonal cells correspond to objects selected by only one algorithm. There is no overlap in interesting objects between all three autoencoders. The small number of objects in the shared “Yes-Yes” cells demonstrates that the different data modalities highlight distinct sets of interesting supernova candidates.