A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients

Rithwik Gupta; Daniel Muthukrishna; Michelle Lochner

A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients

Rithwik Gupta, Daniel Muthukrishna, Michelle Lochner

TL;DR

This work tackles the challenge of real-time anomaly detection in the time-domain astronomy deluge by repurposing the latent space of a GRU-based light-curve classifier. The authors introduce Multi-Class Isolation Forests (MCIF), which trains an isolation forest for each known class and uses the minimum score across classes to detect anomalies in a 100-dimensional latent space. On simulated ZTF-like data with 12,040 common transients and 54 anomalies, MCIF recovers about 75% of anomalies within the top roughly 15% of ranked candidates, with substantial robustness observed when anomalous calcium-rich transients are excluded. The approach mitigates interpolation needs, leverages inter-passband information, and shows promise for early, real-time identification of rare transients, offering practical utility for next-generation surveys such as LSST.

Abstract

Automating real-time anomaly detection is essential for identifying rare transients, with modern survey telescopes generating tens of thousands of alerts per night, and future telescopes, such as the Vera C. Rubin Observatory, projected to increase this number dramatically. Currently, most anomaly detection algorithms for astronomical transients rely either on hand-crafted features extracted from light curves or on features generated through unsupervised representation learning, coupled with standard anomaly detection algorithms. In this work, we introduce an alternative approach: using the penultimate layer of a neural network classifier as the latent space for anomaly detection. We then propose a novel method, Multi-Class Isolation Forests (\texttt{MCIF}), which trains separate isolation forests for each class to derive an anomaly score for a light curve from its latent space representation. This approach significantly outperforms a standard isolation forest. We also use a simpler input method for real-time transient classifiers which circumvents the need for interpolation and helps the neural network handle irregular sampling and model inter-passband relationships. Our anomaly detection pipeline identifies rare classes including kilonovae, pair-instability supernovae, and intermediate luminosity transients shortly after trigger on simulated Zwicky Transient Facility light curves. Using a sample of our simulations matching the population of anomalies expected in nature (54 anomalies and 12,040 common transients), our method discovered $41\pm3$ anomalies (~75% recall) after following up the top 2000 (~15%) ranked transients. Our novel method shows that classifiers can be effectively repurposed for real-time anomaly detection.

A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients

TL;DR

Abstract

anomalies (~75% recall) after following up the top 2000 (~15%) ranked transients. Our novel method shows that classifiers can be effectively repurposed for real-time anomaly detection.

Paper Structure (23 sections, 8 equations, 17 figures, 2 tables)

This paper contains 23 sections, 8 equations, 17 figures, 2 tables.

Introduction
Data
Simulated Data
Preprocessing
Methods
Overview and Rationale
Classifier
Multi-Class Isolation Forests
Limitations of Evaluating Anomaly Detection Methods
Results and Analysis
Classifier
Latent Representation
Anomaly Detection
Anomaly Precision and Recall
Detection Rates in a Representative Population
...and 8 more sections

Figures (17)

Figure 1: A visual summary of the architecture described in this work. Our approach first trains a classifier, then repurposes it as an encoder, and finally applies Multi-Class Isolation Forests (MCIF), proposed in this work, for anomaly detection. Colors in the plot have changed.
Figure 2: A visualization of the neural network classifier being used in this work. Our model has two input streams, one for real-time light curve data and the other for contextual information. The light curve data (first input stream) goes through multiple GRU layers and then a dense layer. The contextual information (second input stream) feeds through a dense layer. The final dense layers from both input streams are merged into a concatenate layer. We feed that to a 100-neuron dense layer that will serve as the latent space of the encoder. Finally, this dense layer feeds into the output layer which provides classification scores. Colors in the plot have changed
Figure 3: The normalized confusion matrix [left] and ROC curve [right] of the 12 common transient classes used for training given full light curve data. Each cell in the confusion matrix signifies the fraction of transients from each True Class that was classified into the Predicted Class. The ROC curve illustrates the True Positive Rate against the False Positive Rate across various threshold probabilities for each class, with the Area Under ROC curve (AUROC) in parenthesis. The model's evaluation is conducted on the test set consisting of 10% of the data from the common classes.
Figure 4: The UMAP reduction of the latent space derived from the test set, which includes 10% of the common transients reserved for testing the classifier [left] and randomly sampled anomalous transients from the unseen anomaly dataset [right]. Despite not being trained on this data, the learned features still exhibit clear visual structure and anomalous transients form distinct clusters separate from the common classes. It is important to note that the UMAP reduction is used only for visualization purposes, and the actual anomaly detection (as seen in Figure \ref{['fig:MCIFAverageScore']} and the remaining plots) is performed on the 100-dimensional latent space.
Figure 5: The median anomaly score (rounded to two decimal places) for each class extracted from the latent representations of full light curves. The scores come from the full, unseen anomalous dataset for anomalous classes and the 10% test dataset for common classes. The five classes on the right (in bold) are anomalous. The separation between the scores of anomalous classes and common classes is evident, and the anomaly scores for the common classes are consistently low signifying they are not erroneously marked anomalous. For further analysis, Figure \ref{['fig:Distribution']} shows the full anomaly score distribution for each class.
...and 12 more figures

A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients

TL;DR

Abstract

A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients

Authors

TL;DR

Abstract

Table of Contents

Figures (17)