Table of Contents
Fetching ...

DTAAD: Dual Tcn-Attention Networks for Anomaly Detection in Multivariate Time Series Data

Lingrui Yu

TL;DR

The paper tackles unsupervised anomaly detection in high-dimensional multivariate time series by proposing DTAAD, a lightweight architecture that combines an autoregressive autoencoder with dual TCNs feeding a Transformer encoder–decoder. By integrating local causal and global dilated convolutions, a residual feedback loop, and a two-loss objective plus MAML-based meta-learning, the approach achieves robust detection and per-dimension diagnosis with significantly reduced training time. Across nine public datasets, DTAAD outperforms most baselines in F1 and AUC, with up to $8.38\%$ F1 gains and up to $99\%$ faster training, demonstrating strong practical potential for industrial and embedded deployments. The use of POT EVT-based thresholds enables dynamic, per-dimension anomaly labeling, contributing to accurate diagnosis and scalable operation in real-world settings.

Abstract

Anomaly detection techniques enable effective anomaly detection and diagnosis in multi-variate time series data, which are of major significance for today's industrial applications. However, establishing an anomaly detection system that can be rapidly and accurately located is a challenging problem due to the lack of anomaly labels, the high dimensional complexity of the data, memory bottlenecks in actual hardware, and the need for fast reasoning. In this paper, we propose an anomaly detection and diagnosis model, DTAAD, based on Transformer and Dual Temporal Convolutional Network (TCN). Our overall model is an integrated design in which an autoregressive model (AR) combines with an autoencoder (AE) structure. Scaling methods and feedback mechanisms are introduced to improve prediction accuracy and expand correlation differences. Constructed by us, the Dual TCN-Attention Network (DTA) uses only a single layer of Transformer encoder in our baseline experiment, belonging to an ultra-lightweight model. Our extensive experiments on seven public datasets validate that DTAAD exceeds the majority of currently advanced baseline methods in both detection and diagnostic performance. Specifically, DTAAD improved F1 scores by $8.38\%$ and reduced training time by $99\%$ compared to the baseline. The code and training scripts are publicly available on GitHub at https://github.com/Yu-Lingrui/DTAAD.

DTAAD: Dual Tcn-Attention Networks for Anomaly Detection in Multivariate Time Series Data

TL;DR

The paper tackles unsupervised anomaly detection in high-dimensional multivariate time series by proposing DTAAD, a lightweight architecture that combines an autoregressive autoencoder with dual TCNs feeding a Transformer encoder–decoder. By integrating local causal and global dilated convolutions, a residual feedback loop, and a two-loss objective plus MAML-based meta-learning, the approach achieves robust detection and per-dimension diagnosis with significantly reduced training time. Across nine public datasets, DTAAD outperforms most baselines in F1 and AUC, with up to F1 gains and up to faster training, demonstrating strong practical potential for industrial and embedded deployments. The use of POT EVT-based thresholds enables dynamic, per-dimension anomaly labeling, contributing to accurate diagnosis and scalable operation in real-world settings.

Abstract

Anomaly detection techniques enable effective anomaly detection and diagnosis in multi-variate time series data, which are of major significance for today's industrial applications. However, establishing an anomaly detection system that can be rapidly and accurately located is a challenging problem due to the lack of anomaly labels, the high dimensional complexity of the data, memory bottlenecks in actual hardware, and the need for fast reasoning. In this paper, we propose an anomaly detection and diagnosis model, DTAAD, based on Transformer and Dual Temporal Convolutional Network (TCN). Our overall model is an integrated design in which an autoregressive model (AR) combines with an autoencoder (AE) structure. Scaling methods and feedback mechanisms are introduced to improve prediction accuracy and expand correlation differences. Constructed by us, the Dual TCN-Attention Network (DTA) uses only a single layer of Transformer encoder in our baseline experiment, belonging to an ultra-lightweight model. Our extensive experiments on seven public datasets validate that DTAAD exceeds the majority of currently advanced baseline methods in both detection and diagnostic performance. Specifically, DTAAD improved F1 scores by and reduced training time by compared to the baseline. The code and training scripts are publicly available on GitHub at https://github.com/Yu-Lingrui/DTAAD.
Paper Structure (19 sections, 38 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 19 sections, 38 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: Dual Tcn-Attention Network (DTANet). This model mainly consists of two parts, i.e., a Tcn-based approximate autoregression layer and a Transformer-based encoder-decoder layer. Among them, TCN is generally a fixed layer in order to satisfy the receptive field, and the encoding layer can be integrated multiple times. Different attentions are sent to the decoder through the residual connection and the final prediction of the local attentions is sent back to the global TCN together with the original input overlay by a copy operation, finally, the two losses are reconstructed according to a certain proportion $\lambda$.
  • Figure 2: Local TCN (Causal Convolutions). This is the structure consisting of three sets of hidden layers, The convolution kernel of size $k=3$, Each layer pads $k-1$ inputs from the leftmost.
  • Figure 3: Global TCN (Dilated Convolutions). This is the structure consisting of three sets of hidden layers, The convolution kernel of size $k=3$, Each layer pads ${{b}^{n-1}}\cdot \left( k-1 \right)$ inputs from the leftmost.
  • Figure 4: Transformer (Encoder). Each base layer of the Encoder contains two sub-layers. The first sub-layer is a multi-head attention mechanism, which takes as its input $q,k,v$, respectively, from the outputs of global and local temporal convolutions, as well as the superposition of corresponding positional encoding. The second sublayer is a fully connected feedforward neural network. Residual connections and layer normalization ba2016layer are introduced for both sublayers.
  • Figure 5: Overall computational flow. The likelihood $\ell \left( \left. {{{\hat{z}}}_{i,t}} \right|{{\theta }_{i,t-2:t}} \right)$ is a distribution whose parameters ${{\theta }_{i,t-2:t}}$ are given by the Dual TCN hidden state ${{\text{h}}_{i,t-2:t}}$. The $\phi \left( {{z}_{i,t}};{{\Theta }_{d}} \right)$ is the decoder network that aggregates the encoder inputs to the final output ${{z}_{i,t}}$.
  • ...and 3 more figures