Table of Contents
Fetching ...

IMPACT: Influence Modeling for Open-Set Time Series Anomaly Detection

Xiaohui Zhou, Yijie Wang, Hongzuo Xu, Weixuan Liang, Xiaoli Li, Guansong Pang

Abstract

Open-set anomaly detection (OSAD) is an emerging paradigm designed to utilize limited labeled data from anomaly classes seen in training to identify both seen and unseen anomalies during testing. Current approaches rely on simple augmentation methods to generate pseudo anomalies that replicate unseen anomalies. Despite being promising in image data, these methods are found to be ineffective in time series data due to the failure to preserve its sequential nature, resulting in trivial or unrealistic anomaly patterns. They are further plagued when the training data is contaminated with unlabeled anomalies. This work introduces $\textbf{IMPACT}$, a novel framework that leverages $\underline{\textbf{i}}$nfluence $\underline{\textbf{m}}$odeling for o$\underline{\textbf{p}}$en-set time series $\underline{\textbf{a}}$nomaly dete$\underline{\textbf{ct}}$ion, to tackle these challenges. The key insight is to $\textbf{i)}$ learn an influence function that can accurately estimate the impact of individual training samples on the modeling, and then $\textbf{ii)}$ leverage these influence scores to generate semantically divergent yet realistic unseen anomalies for time series while repurposing high-influential samples as supervised anomalies for anomaly decontamination. Extensive experiments show that IMPACT significantly outperforms existing state-of-the-art methods, showing superior accuracy under varying OSAD settings and contamination rates.

IMPACT: Influence Modeling for Open-Set Time Series Anomaly Detection

Abstract

Open-set anomaly detection (OSAD) is an emerging paradigm designed to utilize limited labeled data from anomaly classes seen in training to identify both seen and unseen anomalies during testing. Current approaches rely on simple augmentation methods to generate pseudo anomalies that replicate unseen anomalies. Despite being promising in image data, these methods are found to be ineffective in time series data due to the failure to preserve its sequential nature, resulting in trivial or unrealistic anomaly patterns. They are further plagued when the training data is contaminated with unlabeled anomalies. This work introduces , a novel framework that leverages nfluence odeling for oen-set time series nomaly deteion, to tackle these challenges. The key insight is to learn an influence function that can accurately estimate the impact of individual training samples on the modeling, and then leverage these influence scores to generate semantically divergent yet realistic unseen anomalies for time series while repurposing high-influential samples as supervised anomalies for anomaly decontamination. Extensive experiments show that IMPACT significantly outperforms existing state-of-the-art methods, showing superior accuracy under varying OSAD settings and contamination rates.

Paper Structure

This paper contains 29 sections, 4 theorems, 41 equations, 15 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Assuming the multi-channel anomaly scores $\bm{s} \in \mathbb{R}^{r}$ follow a latent distribution $S\sim \mathcal{N}(\bm{\mu},\sigma^{2}I)$, The geometric proximity to the isotropic Gaussian prior is equivalent to the entropy minimization of the latent distribution:

Figures (15)

  • Figure 1: (a) Contaminated anomalies and low-quality pseudo anomalies significantly affect the learned boundary. (b) IMPACT addresses these issues by i) accurately flipping the contaminated unlabeled anomalies into labeled ones through influence modeling and ii) the influence-score-guided generation of high-quality pseudo anomalies from the perspective of reducing the test risk.
  • Figure 2: The overall framework of IMPACT. TIS first quantifies the influence of training samples on test risk, then RADG leverages these influence scores from the risk-reduction perspective to decontaminate the training set via label flipping and synthesize realistic unseen anomalies through feature perturbation. A dual-head architecture is finally trained to effectively detect both seen and unseen anomalies.
  • Figure 3: Sensitivity analysis results (AUC performance w.r.t. different hyperparameters) under the hard setting.
  • Figure 4: AUC performance of IMPACT w.r.t different number of labeled anomalies under the general setting.
  • Figure 5: AUC performance of IMPACT w.r.t different number of labeled anomalies under the hard setting.
  • ...and 10 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • proof
  • proof
  • proof