Table of Contents
Fetching ...

RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data

Xudong Mou, Rui Wang, Bo Li, Tianyu Wo, Jie Sun, Hui Wang, Xudong Liu

TL;DR

RoCA tackles robust time-series anomaly detection under contaminated training data by unifying contrastive learning and one-class classification into a single loss, complemented by a sequence-contrast mechanism and an outlier-exposure term that identifies latent anomalies. It introduces a variance term to prevent hypersphere collapse and uses an iterative training procedure that updates anomaly scores to refine the decision boundary. Empirical results across AIOps, UCR, SWaT, and WADI demonstrate RoCA's superior average performance on both univariate and multivariate datasets and its robustness to contamination. Source code is publicly available, enabling replication and application to diverse TSAD tasks.

Abstract

The accumulation of time-series signals and the absence of labels make time-series Anomaly Detection (AD) a self-supervised task of deep learning. Methods based on normality assumptions face the following three limitations: (1) A single assumption could hardly characterize the whole normality or lead to some deviation. (2) Some assumptions may go against the principle of AD. (3) Their basic assumption is that the training data is uncontaminated (free of anomalies), which is unrealistic in practice, leading to a decline in robustness. This paper proposes a novel robust approach, RoCA, which is the first to address all of the above three challenges, as far as we are aware. It fuses the separated assumptions of one-class classification and contrastive learning in a single training process to characterize a more complete so-called normality. Additionally, it monitors the training data and computes a carefully designed anomaly score throughout the training process. This score helps identify latent anomalies, which are then used to define the classification boundary, inspired by the concept of outlier exposure. The performance on AIOps datasets improved by 6% compared to when contamination was not considered (COCA). On two large and high-dimensional multivariate datasets, the performance increased by 5% to 10%. RoCA achieves the highest average performance on both univariate and multivariate datasets. The source code is available at https://github.com/ruiking04/RoCA.

RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data

TL;DR

RoCA tackles robust time-series anomaly detection under contaminated training data by unifying contrastive learning and one-class classification into a single loss, complemented by a sequence-contrast mechanism and an outlier-exposure term that identifies latent anomalies. It introduces a variance term to prevent hypersphere collapse and uses an iterative training procedure that updates anomaly scores to refine the decision boundary. Empirical results across AIOps, UCR, SWaT, and WADI demonstrate RoCA's superior average performance on both univariate and multivariate datasets and its robustness to contamination. Source code is publicly available, enabling replication and application to diverse TSAD tasks.

Abstract

The accumulation of time-series signals and the absence of labels make time-series Anomaly Detection (AD) a self-supervised task of deep learning. Methods based on normality assumptions face the following three limitations: (1) A single assumption could hardly characterize the whole normality or lead to some deviation. (2) Some assumptions may go against the principle of AD. (3) Their basic assumption is that the training data is uncontaminated (free of anomalies), which is unrealistic in practice, leading to a decline in robustness. This paper proposes a novel robust approach, RoCA, which is the first to address all of the above three challenges, as far as we are aware. It fuses the separated assumptions of one-class classification and contrastive learning in a single training process to characterize a more complete so-called normality. Additionally, it monitors the training data and computes a carefully designed anomaly score throughout the training process. This score helps identify latent anomalies, which are then used to define the classification boundary, inspired by the concept of outlier exposure. The performance on AIOps datasets improved by 6% compared to when contamination was not considered (COCA). On two large and high-dimensional multivariate datasets, the performance increased by 5% to 10%. RoCA achieves the highest average performance on both univariate and multivariate datasets. The source code is available at https://github.com/ruiking04/RoCA.

Paper Structure

This paper contains 29 sections, 20 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Motivation Diagram. In time series data, various types of anomalies exist, including point-wise anomalies (e.g., A1) and pattern-wise anomalies (e.g., A2, A3). Moreover, due to untimely detection in the early stages or data contamination, some signs of anomalies may not be identified and labeled. For instance, the segment preceding A2 may exhibit such characteristics. We refer to these as latent anomalies (L1). Methods based on single assumptions can only detect specific anomalies (a, b). COCA combines multiple assumptions (c) to enhance the model's anomaly detection capability. RoCA (d) identifies and leverages the contaminated data or latent anomalies to optimize the detection boundary.
  • Figure 2: The framework of the RoCA.
  • Figure 3: Invariance term schematic. $O$ is the center of the unit hypersphere, $Ce$ is the $\ell_{2}$-normalized one-class center, $q_{i}$ and $q_{i}^{\prime}$ are $\ell_{2}$-normalized projected vectors, $\theta$ is the dihedral angle between plane $CeOq_{i}$ and $CeOq_{i}^{\prime}$, $\alpha$ and $\beta$ are one-class errors, and $\gamma$ is the contrastive error.
  • Figure 4: Loss and the similarity among $q$,$q'$, and the one class center ($c$) during training on the four datasets. The loss function not only brings $q_{i}$ and $q_{i}^{\prime}$ closer to $Ce$, but also reduces the sequence contrastive error ${\rm sim}(q_{i}, q_{i}^{\prime})$.
  • Figure 5: Experimental metric comparison. RAS is a straightforward method of generating random anomaly scores. The left methods are shallow baselines, and the right six are deep ones. (a) is on UCR which includes pattern-wise anomalies. (b) is conducted on multivariate SWaT, where SR could not work. AFF and PA obtain high scores even when testing RAS, while PW is overly harsh in considering that every method performs poorly on UCR. The comparison between the baselines on different metrics and datasets implies that metrics and benchmarks mislead the fair evaluation.
  • ...and 3 more figures