Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Wenzhen Yue; Xianghua Ying; Ruohao Guo; DongDong Chen; Ji Shi; Bowei Xing; Yuqing Zhu; Taiyan Chen

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Wenzhen Yue, Xianghua Ying, Ruohao Guo, DongDong Chen, Ji Shi, Bowei Xing, Yuqing Zhu, Taiyan Chen

TL;DR

The Sub-Adjacent Transformer reframes time-series anomaly detection as reconstruction guided by an attention pattern that prioritizes sub-adjacent neighborhoods. By using linear attention with a learnable mapping and a loss that couples reconstruction with sub-adjacent attention contribution, it enhances detectability of rare anomalies, and combines reconstruction errors with a calibrated attention-based score, including a dynamic Gaussian variant. It achieves state-of-the-art performance across six real-world datasets and a synthetic benchmark, with robust ablations showing the value of the sub-adjacent attention design, linear attention, and the dynamic scoring. The approach offers a simple yet effective baseline for unsupervised time-series anomaly detection with practical efficiency and broad applicability.

Abstract

In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is that owing to the rarity of anomalies, they typically exhibit more pronounced differences from their sub-adjacent neighborhoods than from their immediate vicinities. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging, thereby enhancing their detectability. Technically, our approach concentrates attention on the non-diagonal areas of the attention matrix by enlarging the corresponding elements in the training stage. To facilitate the implementation of the desired attention matrix pattern, we adopt linear attention because of its flexibility and adaptability. Moreover, a learnable mapping function is proposed to improve the performance of linear attention. Empirically, the Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks, covering diverse fields such as server monitoring, space exploration, and water treatment.

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

TL;DR

Abstract

Paper Structure (31 sections, 7 equations, 7 figures, 11 tables)

This paper contains 31 sections, 7 equations, 7 figures, 11 tables.

Introduction
Related Works
Time Series Anomaly detection.
Linear Attention.
Methods
Problem Formulation
Sub-Adjacent Neighborhoods
Linear Attention.
Loss Function and Anomaly Score
Loss Function.
Anomaly Score.
Dynamic Gaussian Scoring.
Experiments
Datasets
Implementation Details
...and 16 more sections

Figures (7)

Figure 1: Illustration of our method in time domain. The point marked with a red circle, along with its neighbors, represents an anomaly on the sinusoidal signal. (a) Previous works typically utilize the attention across all the points within the window, while (b) our method encourages the use of attention of sub-adjacent neighborhoods (the highlighted area). Such an imposed constraint enlarges the reconstruction challenge for anomalies and thus improves the anomaly detection performance.
Figure 2: Illustration of attention contribution and the desired attention matrix. For clearness, only the main stripes are depicted.
Figure 3: Attention matrices obtained using (a) vanilla self-attention and (b) the linear attention with the proposed mapping function. The SMAP dataset smap and the proposed sub-adjacent neighborhoods are used.
Figure 4: Illustration of vanilla self-attention and linear attention. Without the direct application of Softmax, the attention matrix $\Phi\left ( \mathbf{Q} \right ) \Phi\left ( \mathbf{K} \right )^T$ of linear attention usually exhibits more flexibility.
Figure 5: Visualization of detection results for different anomaly categories in NeurIPS-TS benchmark. The anomalous area are highlighted with red lines/areas. The first and second row represent point and pattern anomalies, respectively. From left to right, the columns indicate raw data, recognition error (Eq. \ref{['loss']}), attention contribution (Eq. \ref{['SACon']}), anomaly score (Eq. \ref{['score']}) and dynamic Gaussian score (Eq. \ref{['score2']}).
...and 2 more figures

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

TL;DR

Abstract

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Authors

TL;DR

Abstract

Table of Contents

Figures (7)