Table of Contents
Fetching ...

Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

Thi Kieu Khanh Ho, Narges Armanfard

TL;DR

The paper tackles practical TSAD under contaminated training data by introducing TSAD-C, a fully unsupervised, end-to-end framework with a Decontaminator (masking + S4 diffusion), a Long-range Variable Dependency Modeling module (time-then-graph with S4 embeddings and GIN-based inter-variable graphs), and an Anomaly Scoring module that fuses decontaminated reconstructions with direct reconstruction errors. It demonstrates state-of-the-art performance across four diverse datasets, showing robustness to varying contamination levels and anomaly types while maintaining computational efficiency via single-step decontamination and windowed graphs. The contribution advances real-world anomaly detection by enabling effective learning from noisy data and by modeling long-range intra- and inter-variable dependencies in a scalable, diffusion-assisted framework.

Abstract

Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data is contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three core modules: a Decontaminator to rectify anomalies (aka noise) present during training, a Long-range Variable Dependency Modeling module to capture long-term intra- and inter-variable dependencies within the decontaminated data that is considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies from all types. Our extensive experiments conducted on four reliable and diverse datasets conclusively demonstrate that TSAD-C surpasses existing methodologies, thus establishing a new state-of-the-art in the TSAD field.

Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

TL;DR

The paper tackles practical TSAD under contaminated training data by introducing TSAD-C, a fully unsupervised, end-to-end framework with a Decontaminator (masking + S4 diffusion), a Long-range Variable Dependency Modeling module (time-then-graph with S4 embeddings and GIN-based inter-variable graphs), and an Anomaly Scoring module that fuses decontaminated reconstructions with direct reconstruction errors. It demonstrates state-of-the-art performance across four diverse datasets, showing robustness to varying contamination levels and anomaly types while maintaining computational efficiency via single-step decontamination and windowed graphs. The contribution advances real-world anomaly detection by enabling effective learning from noisy data and by modeling long-range intra- and inter-variable dependencies in a scalable, diffusion-assisted framework.

Abstract

Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data is contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three core modules: a Decontaminator to rectify anomalies (aka noise) present during training, a Long-range Variable Dependency Modeling module to capture long-term intra- and inter-variable dependencies within the decontaminated data that is considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies from all types. Our extensive experiments conducted on four reliable and diverse datasets conclusively demonstrate that TSAD-C surpasses existing methodologies, thus establishing a new state-of-the-art in the TSAD field.
Paper Structure (29 sections, 17 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 29 sections, 17 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: The overall framework of TSAD-C consists of three modules: the Decontaminator integrates masking strategies and an S4-based diffusion model, the Long-Range Variable Dependency Modeling module incorporates Intra- and Inter-variable Modeling components; and the Anomaly Scoring module leverages insights from the preceding modules to detect anomalies.
  • Figure 2: The architecture of the Decontaminator includes two S4 layers in every residual block to ensure that long-range intra-variable dependencies are effectively captured.
  • Figure 3: (Left) F1 score versus the number of anomaly types $\kappa$. (Right) F1 score versus the anomaly ratio $\eta$.
  • Figure 4: Comparison between normal and abnormal cases for the masked segment in DODH. Sen-$k$ denotes the $k$th sensor. Each case includes $\mathbf{x}_{(i)}$, $\hat{\mathbf{x}}_{(i)}^0$ and $\mathbf{ \hat{\newline {\hat{x}}}}_{(i)}$.
  • Figure 5: Comparison between normal and abnormal cases for the masked segment in (a) SMD and (b) ICBEB and (c) TUSZ. The masked strategy used is BoM. Each case includes the ground truth $\mathbf{x}_{(i)}$, the decontaminated data $\hat{\mathbf{x}}_{(i)}^0$ and the reconstructed data $\mathbf{ \hat{\newline {\hat{x}}}}_{(i)}$.