Table of Contents
Fetching ...

Adversarial-Robust Multivariate Time-Series Anomaly Detection via Joint Information Retention

Hadi Hojjati, Narges Armanfard

Abstract

Time-series anomaly detection (TSAD) is a critical component in monitoring complex systems, yet modern deep learning-based detectors are often highly sensitive to localized input corruptions and structured noise. We propose ARTA (Adversarially Robust multivariate Time-series Anomaly detection via joint information retention), a joint training framework that improves detector robustness through a principled min-max optimization objective. ARTA comprises an anomaly detector and a sparsity-constrained mask generator that are trained simultaneously. The generator identifies minimal, task-relevant temporal perturbations that maximally increase the detector's anomaly score, while the detector is optimized to remain stable under these structured perturbations. The resulting masks characterize the detector's sensitivity to adversarial temporal corruptions and can serve as explanatory signals for the detector's decisions. This adversarial training strategy exposes brittle decision pathways and encourages the detector to rely on distributed and stable temporal patterns rather than spurious localized artifacts. We conduct extensive experiments on the TSB-AD benchmark, demonstrating that ARTA consistently improves anomaly detection performance across diverse datasets and exhibits significantly more graceful degradation under increasing noise levels compared to state-of-the-art baselines.

Adversarial-Robust Multivariate Time-Series Anomaly Detection via Joint Information Retention

Abstract

Time-series anomaly detection (TSAD) is a critical component in monitoring complex systems, yet modern deep learning-based detectors are often highly sensitive to localized input corruptions and structured noise. We propose ARTA (Adversarially Robust multivariate Time-series Anomaly detection via joint information retention), a joint training framework that improves detector robustness through a principled min-max optimization objective. ARTA comprises an anomaly detector and a sparsity-constrained mask generator that are trained simultaneously. The generator identifies minimal, task-relevant temporal perturbations that maximally increase the detector's anomaly score, while the detector is optimized to remain stable under these structured perturbations. The resulting masks characterize the detector's sensitivity to adversarial temporal corruptions and can serve as explanatory signals for the detector's decisions. This adversarial training strategy exposes brittle decision pathways and encourages the detector to rely on distributed and stable temporal patterns rather than spurious localized artifacts. We conduct extensive experiments on the TSB-AD benchmark, demonstrating that ARTA consistently improves anomaly detection performance across diverse datasets and exhibits significantly more graceful degradation under increasing noise levels compared to state-of-the-art baselines.

Paper Structure

This paper contains 46 sections, 3 theorems, 45 equations, 4 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

Let $\delta \in \mathbb{R}^{N \times T}$ satisfy $\|\delta\|_\infty \le \varepsilon$, and let $M \in [0,1]^T$ be a fixed temporal mask. Define the perturbed masked input as Then the anomaly score satisfies the bound

Figures (4)

  • Figure 1: Overview of the proposed method during training and inference.
  • Figure 2: Robustness evaluation of anomaly detection methods under input corruption. Top row: Salt-and-pepper noise with varying corruption probability. Bottom row: Colored noise with varying SNR. Results are shown for three datasets and averaged over five runs.
  • Figure 3: Qualitative examples of generator masks on selected samples. Mask values are thresholded and highlighted in red.
  • Figure 4: Robustness evaluation of anomaly detection methods under additive Gaussian noise with varying signal-to-noise ratios (SNR) across three representative datasets, averaged over five runs. Our method consistently demonstrates slower performance degradation.

Theorems & Definitions (6)

  • Theorem 1: Stability Under Sparse Baseline-Aware Masking
  • proof
  • Theorem 2: Bounded Perturbation Capacity Under Sparse Masking
  • proof
  • Theorem 3: Bounded Anomaly Score Variability
  • proof