Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection

Feiyi Chen; Yingying zhang; Zhen Qin; Lunting Fan; Renhe Jiang; Yuxuan Liang; Qingsong Wen; Shuiguang Deng

Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection

Feiyi Chen, Yingying zhang, Zhen Qin, Lunting Fan, Renhe Jiang, Yuxuan Liang, Qingsong Wen, Shuiguang Deng

TL;DR

MACE is proposed, a multi-normal-pattern accommodated and efficient anomaly detection method in the frequency domain for time series anomaly detection that theoretically and experimentally proves that using a strategically selected subset of Fourier bases can not only reduce computational overhead but is also profitable to distinguish anomalies, compared to using the complete spectrum.

Abstract

Anomaly detection significantly enhances the robustness of cloud systems. While neural network-based methods have recently demonstrated strong advantages, they encounter practical challenges in cloud environments: the contradiction between the impracticality of maintaining a unique model for each service and the limited ability to deal with diverse normal patterns by a unified model, as well as issues with handling heavy traffic in real time and short-term anomaly detection sensitivity. Thus, we propose MACE, a multi-normal-pattern accommodated and efficient anomaly detection method in the frequency domain for time series anomaly detection. There are three novel characteristics of it: (i) a pattern extraction mechanism excelling at handling diverse normal patterns with a unified model, which enables the model to identify anomalies by examining the correlation between the data sample and its service normal pattern, instead of solely focusing on the data sample itself; (ii) a dualistic convolution mechanism that amplifies short-term anomalies in the time domain and hinders the reconstruction of anomalies in the frequency domain, which enlarges the reconstruction error disparity between anomaly and normality and facilitates anomaly detection; (iii) leveraging the sparsity and parallelism of frequency domain to enhance model efficiency. We theoretically and experimentally prove that using a strategically selected subset of Fourier bases can not only reduce computational overhead but is also profitable to distinguish anomalies, compared to using the complete spectrum. Moreover, extensive experiments demonstrate MACE's effectiveness in handling diverse normal patterns with a unified model and it achieves state-of-the-art performance with high efficiency.

Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 6 figures, 9 tables)

This paper contains 16 sections, 6 equations, 6 figures, 9 tables.

Introduction
Related Work
Anomaly Detection
Multi-task Learning
Preliminary
Proposed Method
Overview
Dualistic Convolution
Pattern Extraction
Experiment
Experiment Setup
Prediction Accuracy
Efficiency Analysis
Ablation Study
Hyperparameter Study
...and 1 more sections

Figures (6)

Figure 1: (a) The normal data of each service is compressed into a two-dimensional vector, which is scattered randomly. (b) The figure shows F1 score of some SOTA methods: DCdetector DBLP:conf/kdd/YangZZW023, AnomalyTransformer DBLP:conf/iclr/XuWWL22, DVGCRN DBLP:conf/icml/ChenTCDDZ22, OmniAnomaly su2019robust, MSCRED zhang2019deep, TranAD DBLP:journals/pvldb/TuliCJ22. (c) A data sample is projected to a normal pattern subspace in a pattern extraction mechanism. When the data sample is closer to the normal pattern subspace, it is easier to reconstruct it from its projection with less reconstruction error. Thus, the data sample is more likely to be inferred as normality for the normal pattern 1 than normal pattern 2. anomalies.
Figure 2: The model architecture of MACE
Figure 3: (a) The figure shows the contributions of different time slots in peak convolution result in a convolution window when specifying different $\gamma$. As $\gamma$ grows, the contribution of deviations increases significantly. (b)-(c) The utility of dualistic convolution, compared with standard convolution.
Figure 4: (a) The dualistic convolution applying to the frequency domain actually picks the prominent deviation in each compression step. (b) The figure shows the three channels of frequency representation in the frequency characterization module. The first channel is the result of Fourier transformation, the second is corresponding $sin$ Fourier bases and the third is corresponding $cos$ Fourier bases.
Figure 5: (a) We first use kernel density estimation to estimate the distribution of each subset. Subsequently, we compute KL divergence between each pair of subsets in a training group. The figure shows the distribution of the KL divergences of different datasets. (b) The figure shows the point anomaly, context anomaly and normal pattern ratios in each dataset. (c) All the methods train a unified model for every ten services. The figure shows their F1 score across different services.
...and 1 more figures

Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection

TL;DR

Abstract

Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)