Table of Contents
Fetching ...

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Jiehui Xu, Haixu Wu, Jianmin Wang, Mingsheng Long

TL;DR

This work addresses unsupervised time series anomaly detection by introducing the Anomaly Transformer, which learns time-point associations through a novel Anomaly-Attention mechanism. It defines Association Discrepancy as the symmetrized KL divergence between prior-association and series-association, and employs a minimax learning strategy to amplify normal-abnormal distinction while reconstructing the signal. The approach yields state-of-the-art results across six benchmarks spanning service monitoring, space/earth exploration, and water-treatment domains, with extensive ablations validating the contribution of the association-based criterion and the minimax optimization. The method improves robustness to diverse anomaly types by leveraging both adjacent-concentration priors and learnable series associations, offering a practical and scalable solution for real-world surveillance systems.

Abstract

Unsupervised detection of anomaly points in time series is a challenging problem, which requires the model to derive a distinguishable criterion. Previous methods tackle the problem mainly through learning pointwise representation or pairwise association, however, neither is sufficient to reason about the intricate dynamics. Recently, Transformers have shown great power in unified modeling of pointwise representation and pairwise association, and we find that the self-attention weight distribution of each time point can embody rich association with the whole series. Our key observation is that due to the rarity of anomalies, it is extremely difficult to build nontrivial associations from abnormal points to the whole series, thereby, the anomalies' associations shall mainly concentrate on their adjacent time points. This adjacent-concentration bias implies an association-based criterion inherently distinguishable between normal and abnormal points, which we highlight through the \emph{Association Discrepancy}. Technically, we propose the \emph{Anomaly Transformer} with a new \emph{Anomaly-Attention} mechanism to compute the association discrepancy. A minimax strategy is devised to amplify the normal-abnormal distinguishability of the association discrepancy. The Anomaly Transformer achieves state-of-the-art results on six unsupervised time series anomaly detection benchmarks of three applications: service monitoring, space & earth exploration, and water treatment.

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

TL;DR

This work addresses unsupervised time series anomaly detection by introducing the Anomaly Transformer, which learns time-point associations through a novel Anomaly-Attention mechanism. It defines Association Discrepancy as the symmetrized KL divergence between prior-association and series-association, and employs a minimax learning strategy to amplify normal-abnormal distinction while reconstructing the signal. The approach yields state-of-the-art results across six benchmarks spanning service monitoring, space/earth exploration, and water-treatment domains, with extensive ablations validating the contribution of the association-based criterion and the minimax optimization. The method improves robustness to diverse anomaly types by leveraging both adjacent-concentration priors and learnable series associations, offering a practical and scalable solution for real-world surveillance systems.

Abstract

Unsupervised detection of anomaly points in time series is a challenging problem, which requires the model to derive a distinguishable criterion. Previous methods tackle the problem mainly through learning pointwise representation or pairwise association, however, neither is sufficient to reason about the intricate dynamics. Recently, Transformers have shown great power in unified modeling of pointwise representation and pairwise association, and we find that the self-attention weight distribution of each time point can embody rich association with the whole series. Our key observation is that due to the rarity of anomalies, it is extremely difficult to build nontrivial associations from abnormal points to the whole series, thereby, the anomalies' associations shall mainly concentrate on their adjacent time points. This adjacent-concentration bias implies an association-based criterion inherently distinguishable between normal and abnormal points, which we highlight through the \emph{Association Discrepancy}. Technically, we propose the \emph{Anomaly Transformer} with a new \emph{Anomaly-Attention} mechanism to compute the association discrepancy. A minimax strategy is devised to amplify the normal-abnormal distinguishability of the association discrepancy. The Anomaly Transformer achieves state-of-the-art results on six unsupervised time series anomaly detection benchmarks of three applications: service monitoring, space & earth exploration, and water treatment.

Paper Structure

This paper contains 44 sections, 7 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Anomaly Transformer architecture. Anomaly-Attention (left) models the prior-association and series-association simultaneously. In addition to the reconstruction loss, our model is also optimized by the minimax strategy with a specially-designed stop-gradient mechanism (gray arrows) to constrain the prior- and series- associations for more distinguishable association discrepancy.
  • Figure 2: Minimax association learning. At the minimize phase, the prior-association minimizes the Association Discrepancy within the distribution family derived by Gaussian kernel. At the maximize phase, the series-association maximizes the Association Discrepancy under the reconstruction loss.
  • Figure 3: ROC curves (horizontal-axis: false-positive rate; vertical-axis: true-positive rate) for five corresponding datasets. A higher AUC value (area under the ROC curve) indicates a better performance. The predefined threshold proportion $r$ is in $\{0.5\%,1.0\%,1.5\%,2.0\%,10\%,20\%,30\%\}$.
  • Figure 4: Results for NeurIPS-TS.
  • Figure 5: Visualization of different anomaly categories Lai2021RevisitingTS. We plot the raw series (first row) from NeurIPS-TS dataset, as well as their corresponding reconstruction (second row) and association-based criteria (third row). The point-wise anomalies are marked by red circles and the pattern-wise anomalies are in red segments. The wrongly detected cases are bounded by red boxes.
  • ...and 6 more figures