Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods
Wenzhen Yue, Xianghua Ying, Ruohao Guo, DongDong Chen, Ji Shi, Bowei Xing, Yuqing Zhu, Taiyan Chen
TL;DR
The Sub-Adjacent Transformer reframes time-series anomaly detection as reconstruction guided by an attention pattern that prioritizes sub-adjacent neighborhoods. By using linear attention with a learnable mapping and a loss that couples reconstruction with sub-adjacent attention contribution, it enhances detectability of rare anomalies, and combines reconstruction errors with a calibrated attention-based score, including a dynamic Gaussian variant. It achieves state-of-the-art performance across six real-world datasets and a synthetic benchmark, with robust ablations showing the value of the sub-adjacent attention design, linear attention, and the dynamic scoring. The approach offers a simple yet effective baseline for unsupervised time-series anomaly detection with practical efficiency and broad applicability.
Abstract
In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is that owing to the rarity of anomalies, they typically exhibit more pronounced differences from their sub-adjacent neighborhoods than from their immediate vicinities. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging, thereby enhancing their detectability. Technically, our approach concentrates attention on the non-diagonal areas of the attention matrix by enlarging the corresponding elements in the training stage. To facilitate the implementation of the desired attention matrix pattern, we adopt linear attention because of its flexibility and adaptability. Moreover, a learnable mapping function is proposed to improve the performance of linear attention. Empirically, the Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks, covering diverse fields such as server monitoring, space exploration, and water treatment.
