Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

Daesoo Lee; Sara Malacarne; Erlend Aune

Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

Daesoo Lee, Sara Malacarne, Erlend Aune

TL;DR

The paper tackles the challenge of accurate time series anomaly detection while demanding high explainability. It introduces TimeVQVAE‑AD, which repurposes TimeVQVAE’s masked generative prior learned in a time–frequency latent space to compute anomaly scores as $a = -\log p_\theta(\text{s}|\text{s}_M)$ over sliding latent windows controlled by $\alpha$, enabling frequency‑band specific diagnostics and counterfactual sampling. The approach combines LF–HF latent space merging with a dimensionality‑preserving encoder to maintain temporal and spectral semantics, and it provides explainable sampling to visualize likely normal realizations of anomalous segments. On the UCR‑TSA archive, TimeVQVAE‑AD outperforms existing methods in top‑1/top‑k accuracy and delivers rich explanations through frequency‑resolved anomaly scores and counterfactual samples, with code and visualizations freely available for transparency and reproducibility.

Abstract

We present a novel time series anomaly detection method that achieves excellent detection accuracy while offering a superior level of explainability. Our proposed method, TimeVQVAE-AD, leverages masked generative modeling adapted from the cutting-edge time series generation method known as TimeVQVAE. The prior model is trained on the discrete latent space of a time-frequency domain. Notably, the dimensional semantics of the time-frequency domain are preserved in the latent space, enabling us to compute anomaly scores across different frequency bands, which provides a better insight into the detected anomalies. Additionally, the generative nature of the prior model allows for sampling likely normal states for detected anomalies, enhancing the explainability of the detected anomalies through counterfactuals. Our experimental evaluation on the UCR Time Series Anomaly archive demonstrates that TimeVQVAE-AD significantly surpasses the existing methods in terms of detection accuracy and explainability. We provide our implementation on GitHub: https://github.com/ML4ITS/TimeVQVAE-AnomalyDetection.

Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

TL;DR

over sliding latent windows controlled by

, enabling frequency‑band specific diagnostics and counterfactual sampling. The approach combines LF–HF latent space merging with a dimensionality‑preserving encoder to maintain temporal and spectral semantics, and it provides explainable sampling to visualize likely normal realizations of anomalous segments. On the UCR‑TSA archive, TimeVQVAE‑AD outperforms existing methods in top‑1/top‑k accuracy and delivers rich explanations through frequency‑resolved anomaly scores and counterfactual samples, with code and visualizations freely available for transparency and reproducibility.

Abstract

Paper Structure (46 sections, 4 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 46 sections, 4 equations, 12 figures, 2 tables, 1 algorithm.

Introduction
State of Time Series Anomaly Detection
Related Work
Existing Anomaly Detection Methods
Non-Deep Learning-based TSAD Methods
Deep Learning-based TSAD Methods
Limitation of Reconstruction or Forecasting-based TSAD Methods
Limitation of Adversarial Learning-based TSAD Methods
Limitation of Existing Density Estimation-based TSAD Methods
TimeVQVAE: A Powerful Time Series Generation Method
Existing Approaches for XAI for TSAD
Method
Training
Architecture
LF-HF Latent Space Merge
...and 31 more sections

Figures (12)

Figure 1: Overview of the inference process of our proposed method, TimeVQVAE-AD. In the figure, the time series $\textit{x}$ exhibits a high-frequency anomaly. Initially, $\textit{x}$ is processed by Short Time Fourier Transform (STFT), followed by processing through an encoder and a vector quantizer, resulting in $\textit{s}$. Here, $\textit{s}$ represents a set of tokens derived from $\textit{x}$, with color similarity indicating the Euclidean similarity between these tokens. The two axes of $\textit{s}$ correspond to time and frequency, respectively. Subsequently, a segment of $\textit{s}$ is masked to enable the prior model to sample likely normal tokens from the masked tokens (explainable sampling), and to compute the anomaly scores (anomaly detection) where low and high anomaly scores are depicted in blue and red, respectively. A noteworthy aspect of our method is its capability to address a broad spectrum of anomalies, due to the robust generative prior model that utilizes a learned prior to evaluate the likelihood of $\textit{s}$ representing a normal state.
Figure 2: Example of the two perspectives of explainability: 1) factorization of anomalies in terms of anomaly types via frequency decomposition, 2) presentation of a corresponding normal state. The two subfigures show time series with different anomaly types: LF and HF anomalies, respectively. In each subfigure, the first figure presents a time series (black) with anomaly labels (orange), the second figure presents predicted anomaly scores with respect to different frequency bands using our proposed method, where the bottom row and the top row represent the lowest and highest frequency band, respectively (blue: low anomaly score, red: high anomaly score), and the third figure shows the likely normal states of the time series, in which the anomalous segments are resampled using our learned prior model. Note that the predicted anomaly scores are high in the low frequency band in (a), the scores are high in the high frequency band in (b), and the likely normal states are highly convincing when observed by human eyes.
Figure 3: Examples of inevitable failure cases for reconstruction or forecasting-based TSAD methods. In each subfigure, the second figure presents predicted anomaly scores (pink). A reconstruction or forecasting error $\| x_\mathrm{train} - \hat{x} \|$ tends to be larger on timesteps with large amplitudes, as they are more challenging to predict. This results in higher anomaly scores at peaks. In contrast, anomalies with small amplitudes inevitably yield low predicted anomaly scores due to their nature, as observed in (a) and (b).
Figure 4: Overview of the first stage (stage 1) and the second stage (stage 2). The snowflakes indicate that the models are set to be untrainable, mask denotes random-masking, and the bidirectional transformer corresponds to the prior model.
Figure 5: Examples of the visual results showcasing predicted anomaly scores by TimeVQVAE-AD on different datasets from the UCR-TSA archive. The first row shows a test time series (black) with the corresponding labels (orange), the second row presents the anomaly scores with the height representing the frequency dimension $\textit{a}_s^*$ clipped by the threshold, the third row presents the final anomaly scores $\textbf{a}_{\text{final}}$, and the last row presents the likely normal states achieved through explainable sampling. It should be noted that the scaling of the likely normal states depends on the scaling factors of the corresponding original time series segment. For instance, if an original time series segment has a low mean value, the likely normal states also have a low mean value. The same principle applies to the scaling of the variance.
...and 7 more figures

Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

TL;DR

Abstract

Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (12)