A Classifier-Based Approach to Multi-Class Anomaly Detection Applied to Astronomical Time-Series
Rithwik Gupta, Daniel Muthukrishna, Michelle Lochner
TL;DR
The paper tackles automated anomaly detection in time-domain astronomy by leveraging the latent space of a classifier. It introduces Multi-Class Isolation Forests (MCIF), which trains a separate isolation forest for each known class and uses the minimum class-specific score to decide anomalies, applied to a 100-dimensional latent representation from a GRU-based light-curve classifier. On simulated ZTF-like data, MCIF achieves strong anomaly recall and competitive AUROC compared to state-of-the-art approaches, and analysis reveals how latent-space clustering affects performance. The approach supports real-time detection implications and demonstrates that repurposing classifiers as anomaly detectors can scale to the data deluge expected from LSST, with code publicly available for broader reuse.
Abstract
Automating anomaly detection is an open problem in many scientific fields, particularly in time-domain astronomy, where modern telescopes generate millions of alerts per night. Currently, most anomaly detection algorithms for astronomical time-series rely either on hand-crafted features or on features generated through unsupervised representation learning, coupled with standard anomaly detection algorithms. In this work, we introduce a novel approach that leverages the latent space of a neural network classifier for anomaly detection. We then propose a new method called Multi-Class Isolation Forests (MCIF), which trains separate isolation forests for each class to derive an anomaly score for an object based on its latent space representation. This approach significantly outperforms a standard isolation forest when distinct clusters exist in the latent space. Using a simulated dataset emulating the Zwicky Transient Facility (54 anomalies and 12,040 common), our anomaly detection pipeline discovered $46\pm3$ anomalies ($\sim 85\%$ recall) after following up the top 2,000 ($\sim 15\%$) ranked objects. Furthermore, our classifier-based approach outperforms or approaches the performance of other state-of-the-art anomaly detection pipelines. Our novel method demonstrates that existing and new classifiers can be effectively repurposed for real-time anomaly detection. The code used in this work, including a Python package, is publicly available, https://github.com/Rithwik-G/AstroMCAD.
