Table of Contents
Fetching ...

Decomposition-based multi-scale transformer framework for time series anomaly detection

Wenxin Zhang, Cuicui Luo

TL;DR

This work tackles time series anomaly detection in multivariate data under noisy conditions. It introduces TransDe, a decomposition-based multi-scale transformer that first splits series into trend and cyclical components via an HP filter, then learns intra-patch and inter-patch dependencies with a shared patch-based transformer, fused through dimension expansion, and trained with a symmetric KL-divergence contrastive loss using a stop-gradient strategy. The approach yields state-of-the-art F1 scores across five public datasets, validated by extensive ablations that confirm the efficacy of decomposition, patch-based learning, and the contrastive objective. The method offers practical benefits for robust anomaly detection in real-world systems and provides a public codebase for reproducibility.

Abstract

Time series anomaly detection is crucial for maintaining stable systems. Existing methods face two main challenges. First, it is difficult to directly model the dependencies of diverse and complex patterns within the sequences. Second, many methods that optimize parameters using mean squared error struggle with noise in the time series, leading to performance deterioration. To address these challenges, we propose a transformer-based framework built on decomposition (TransDe) for multivariate time series anomaly detection. The key idea is to combine the strengths of time series decomposition and transformers to effectively learn the complex patterns in normal time series data. A multi-scale patch-based transformer architecture is proposed to exploit the representative dependencies of each decomposed component of the time series. Furthermore, a contrastive learn paradigm based on patch operation is proposed, which leverages KL divergence to align the positive pairs, namely the pure representations of normal patterns between different patch-level views. A novel asynchronous loss function with a stop-gradient strategy is further introduced to enhance the performance of TransDe effectively. It can avoid time-consuming and labor-intensive computation costs in the optimization process. Extensive experiments on five public datasets are conducted and TransDe shows superiority compared with twelve baselines in terms of F1 score. Our code is available at https://github.com/shaieesss/TransDe.

Decomposition-based multi-scale transformer framework for time series anomaly detection

TL;DR

This work tackles time series anomaly detection in multivariate data under noisy conditions. It introduces TransDe, a decomposition-based multi-scale transformer that first splits series into trend and cyclical components via an HP filter, then learns intra-patch and inter-patch dependencies with a shared patch-based transformer, fused through dimension expansion, and trained with a symmetric KL-divergence contrastive loss using a stop-gradient strategy. The approach yields state-of-the-art F1 scores across five public datasets, validated by extensive ablations that confirm the efficacy of decomposition, patch-based learning, and the contrastive objective. The method offers practical benefits for robust anomaly detection in real-world systems and provides a public codebase for reproducibility.

Abstract

Time series anomaly detection is crucial for maintaining stable systems. Existing methods face two main challenges. First, it is difficult to directly model the dependencies of diverse and complex patterns within the sequences. Second, many methods that optimize parameters using mean squared error struggle with noise in the time series, leading to performance deterioration. To address these challenges, we propose a transformer-based framework built on decomposition (TransDe) for multivariate time series anomaly detection. The key idea is to combine the strengths of time series decomposition and transformers to effectively learn the complex patterns in normal time series data. A multi-scale patch-based transformer architecture is proposed to exploit the representative dependencies of each decomposed component of the time series. Furthermore, a contrastive learn paradigm based on patch operation is proposed, which leverages KL divergence to align the positive pairs, namely the pure representations of normal patterns between different patch-level views. A novel asynchronous loss function with a stop-gradient strategy is further introduced to enhance the performance of TransDe effectively. It can avoid time-consuming and labor-intensive computation costs in the optimization process. Extensive experiments on five public datasets are conducted and TransDe shows superiority compared with twelve baselines in terms of F1 score. Our code is available at https://github.com/shaieesss/TransDe.

Paper Structure

This paper contains 23 sections, 17 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Overview of TransDe. The model consists of five parts, (1) sequences decomposition, (2) patch operation, (3) transformer-based representation learning, (4) dimension expansion and information fusion, and (5) training with contrastive loss and anomaly detection.
  • Figure 2: Attention mechanism for inter-patch and intra-patch views
  • Figure 3: A simple example of dimension expansion for different patch views
  • Figure 4: The ablation experiments of normalization operation
  • Figure 5: The ablation experiments of contrastive paradigms
  • ...and 3 more figures