Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

Thi Kieu Khanh Ho; Narges Armanfard

Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

Thi Kieu Khanh Ho, Narges Armanfard

TL;DR

The paper tackles practical TSAD under contaminated training data by introducing TSAD-C, a fully unsupervised, end-to-end framework with a Decontaminator (masking + S4 diffusion), a Long-range Variable Dependency Modeling module (time-then-graph with S4 embeddings and GIN-based inter-variable graphs), and an Anomaly Scoring module that fuses decontaminated reconstructions with direct reconstruction errors. It demonstrates state-of-the-art performance across four diverse datasets, showing robustness to varying contamination levels and anomaly types while maintaining computational efficiency via single-step decontamination and windowed graphs. The contribution advances real-world anomaly detection by enabling effective learning from noisy data and by modeling long-range intra- and inter-variable dependencies in a scalable, diffusion-assisted framework.

Abstract

Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data is contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three core modules: a Decontaminator to rectify anomalies (aka noise) present during training, a Long-range Variable Dependency Modeling module to capture long-term intra- and inter-variable dependencies within the decontaminated data that is considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies from all types. Our extensive experiments conducted on four reliable and diverse datasets conclusively demonstrate that TSAD-C surpasses existing methodologies, thus establishing a new state-of-the-art in the TSAD field.

Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

TL;DR

Abstract

Paper Structure (29 sections, 17 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 29 sections, 17 equations, 5 figures, 6 tables, 2 algorithms.

Introduction
Proposed Method
Decontaminator
Long-range Variable Dependency Modeling
Anomaly Scoring
Experiments
Experimental Settings
Experimental Results
Comparison with State-of-the-Art
Resilience to Contamination Levels
Visualization of Normal Approximation
Ablation Study
Decontaminator Efficiency Study
Effect of Masking Strategy
Conclusion
...and 14 more sections

Figures (5)

Figure 1: The overall framework of TSAD-C consists of three modules: the Decontaminator integrates masking strategies and an S4-based diffusion model, the Long-Range Variable Dependency Modeling module incorporates Intra- and Inter-variable Modeling components; and the Anomaly Scoring module leverages insights from the preceding modules to detect anomalies.
Figure 2: The architecture of the Decontaminator includes two S4 layers in every residual block to ensure that long-range intra-variable dependencies are effectively captured.
Figure 3: (Left) F1 score versus the number of anomaly types $\kappa$. (Right) F1 score versus the anomaly ratio $\eta$.
Figure 4: Comparison between normal and abnormal cases for the masked segment in DODH. Sen-$k$ denotes the $k$th sensor. Each case includes $\mathbf{x}_{(i)}$, $\hat{\mathbf{x}}_{(i)}^0$ and $\mathbf{ \hat{\newline {\hat{x}}}}_{(i)}$.
Figure 5: Comparison between normal and abnormal cases for the masked segment in (a) SMD and (b) ICBEB and (c) TUSZ. The masked strategy used is BoM. Each case includes the ground truth $\mathbf{x}_{(i)}$, the decontaminated data $\hat{\mathbf{x}}_{(i)}^0$ and the reconstructed data $\mathbf{ \hat{\newline {\hat{x}}}}_{(i)}$.

Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

TL;DR

Abstract

Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)