Table of Contents
Fetching ...

Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection

Zhangkai Wu, Longbing Cao, Qi Zhang, Junxian Zhou, Hui Chen

TL;DR

This work tackles robustness gaps in VAE-based time-series anomaly detection caused by data scarcity and latent holes. It introduces WAVAE, a weakly augmented variational framework that jointly trains raw and lightly augmented views to align their data likelihoods, using mutual information between latent representations as a guiding signal. The authors develop two MI-approximation pathways—infoNCE-based contrastive learning and adversarial density-ratio—to maximize alignment between views, and implement end-to-end training with weak normalization-based augmentations. Extensive experiments across five public datasets and numerous baselines demonstrate that WAVAE, especially its contrastive MI variant, achieves superior ROC-AUC and PR-AUC, with ablations clarifying the effects of latent dimensionality, SSL components, and time-series processing. The proposed approach provides a scalable, data-efficient strategy for robust TSAD in realistic, data-scarce environments, with practical impact on unsupervised anomaly detection pipelines.

Abstract

Due to their unsupervised training and uncertainty estimation, deep Variational Autoencoders (VAEs) have become powerful tools for reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based TSAD methods, either statistical or deep, tune meta-priors to estimate the likelihood probability for effectively capturing spatiotemporal dependencies in the data. However, these methods confront the challenge of inherent data scarcity, which is often the case in anomaly detection tasks. Such scarcity easily leads to latent holes, discontinuous regions in latent space, resulting in non-robust reconstructions on these discontinuous spaces. We propose a novel generative framework that combines VAEs with self-supervised learning (SSL) to address this issue.

Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection

TL;DR

This work tackles robustness gaps in VAE-based time-series anomaly detection caused by data scarcity and latent holes. It introduces WAVAE, a weakly augmented variational framework that jointly trains raw and lightly augmented views to align their data likelihoods, using mutual information between latent representations as a guiding signal. The authors develop two MI-approximation pathways—infoNCE-based contrastive learning and adversarial density-ratio—to maximize alignment between views, and implement end-to-end training with weak normalization-based augmentations. Extensive experiments across five public datasets and numerous baselines demonstrate that WAVAE, especially its contrastive MI variant, achieves superior ROC-AUC and PR-AUC, with ablations clarifying the effects of latent dimensionality, SSL components, and time-series processing. The proposed approach provides a scalable, data-efficient strategy for robust TSAD in realistic, data-scarce environments, with practical impact on unsupervised anomaly detection pipelines.

Abstract

Due to their unsupervised training and uncertainty estimation, deep Variational Autoencoders (VAEs) have become powerful tools for reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based TSAD methods, either statistical or deep, tune meta-priors to estimate the likelihood probability for effectively capturing spatiotemporal dependencies in the data. However, these methods confront the challenge of inherent data scarcity, which is often the case in anomaly detection tasks. Such scarcity easily leads to latent holes, discontinuous regions in latent space, resulting in non-robust reconstructions on these discontinuous spaces. We propose a novel generative framework that combines VAEs with self-supervised learning (SSL) to address this issue.
Paper Structure (26 sections, 17 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 17 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of the latent hole phenomenon induced by anomalies in Nonrobust VAE-based TSAD Models (upper section) with the robust representation learning space fostered by WAVAE (lower section). The upper part of the figure delineates the rise of the latent hole within the Nonrobust TSAD model and its effect on model robustness. Specifically, anomalous sequences $\boldsymbol{x}_{t}^{r}$ (depicted within the blue window in the upper section), when encoded into the representation space, disrupt the structural integrity of the latent space. This disruption results in latent hole primarily because these anomalous sequences $\boldsymbol{x}_{t}^{r}$ lack the spatiotemporal coherence inherent in the normal sequence $\boldsymbol{x}_{1}^{r}$. Consequently, sampling from these discontinuous regions leads to a mismatch between the representation (indicated by the blue dot $\boldsymbol{z}_{t}$) and its generation (also shown by the blue dot in the likelihood function), as illustrated in the upper section, a disproportionately high likelihood function mass characterizes the representation in the latent space. In such scenarios, the TSAD model may erroneously classify an anomaly as normal, compromising its robustness. In contrast, the lower section demonstrates how data augmentation via the WAVAE model can engender a more continuous and smoothly distributed data likelihood (as depicted in the central part of the bottom figure). In this context, representation $\boldsymbol{z}_{t}$) encoded by anomalous sequences $\boldsymbol{x}_{t}^{a}$ sampled from regions outside the normal latent space are associated with a lower likelihood function mass, thereby enhancing the robustness and efficacy of anomaly detection in TSAD tasks.
  • Figure 2: Graphical Model for Augmented Variational Autoencoders. Under the plate notation rules, a white circle denotes a hidden (or latent) variable, while a gray circle signifies an observed variable. The variables contained within the square denote local variables, which are independently repeated $N$ times. Dashed arrow edges imply conditional dependence. Dotted lines represent parameters. Referring to the plate diagram, it is evident that our methodology encompasses the utilization of two generative models. The inference part of models, i.e., $q_{\phi_{\mathrm{r}}}$ and $q_{\phi_{\mathrm{a}}}$, encode the raw input, denoted as $\boldsymbol{x}_{\mathrm{r}}$, and the augmented input, $\boldsymbol{x}_{\mathrm{a}}$, into their respective low-dimensional representations, $\boldsymbol{z}_{\mathrm{r}}$ and $\boldsymbol{z}_{\mathrm{a}}$. Subsequently, the generative parts of models $p_{\theta_{\mathrm{r}}}$ and $p_{\theta_{\mathrm{a}}}$, sample the latent space reconstruct the input samples, respectively. We employ a $\psi$ parameterized module to synchronize the learning outcomes of both models.
  • Figure 3: Illusration of adversarial learning in mutation information approximation. In the first stage, the discriminator is frozen to update the parameters of Encoders and decoders. In the second stage, We freeze the parameters of both the generator and the discriminator while simultaneously inverting the pseudo-labels of positive and negative samples to train the discriminator.
  • Figure 4: The overall framework of WAVAE, training begins with the raw data $\boldsymbol{x}_{\mathrm{r}}$ undergoing an augmentation algorithm $AUG$, resulting in augmented data $\boldsymbol{x}_{\mathrm{a}}$. Concurrently, we train a shared-parameter VAE separately for both sets of data. However, evaluation, i.e., Anomaly detector, is conducted solely on the original model between raw input $\boldsymbol{x}_{\mathrm{r}}$ and its' reconstruction $\hat{\boldsymbol{x}}_{\mathrm{r}}$, essentially designing an end-to-end anomaly detector.
  • Figure 5: Sensitivity analysis of VAE related hyperparameters indicates significant findings: (a) reveals that the dimension of $\boldsymbol{z}$ profoundly influences outcomes, with optimal performance when the dimension ranges between 14 and 20. (b) shows that $\beta$ exerts a minimal effect on optimization, peaking in efficacy at 0.001. (c) demonstrates the superior performance of the MSE loss function.
  • ...and 3 more figures