Table of Contents
Fetching ...

Towards efficient deep autoencoders for multivariate time series anomaly detection

Marcin Pietroń, Dominik Żurek, Kamil Faber, Roberto Corizzo

TL;DR

The paper addresses the need for real-time anomaly detection in multivariate time series by compressing deep autoencoder models. It introduces a three-stage workflow combining adaptive pruning with per-layer sparsity and two quantization strategies (linear and non-linear), applied to CNN- and graph-based autoencoders, followed by non-gradient fine-tuning. Empirical results on benchmarks SWAT, WADI-2019, MSL, and SMAP show compression in the range of 80–95% with dataset-dependent accuracy changes, where 16/8-bit quantization is generally robust and 4-bit quantization is viable only for some datasets. The work enables more efficient deployment of anomaly detection on edge/IoT hardware and real-time systems, highlighting the trade-offs between compression level and detection performance and pointing toward retraining-based quantization as future improvement.

Abstract

Multivariate time series anomaly detection is a crucial problem in many industrial and research applications. Timely detection of anomalies allows, for instance, to prevent defects in manufacturing processes and failures in cyberphysical systems. Deep learning methods are preferred among others for their accuracy and robustness for the analysis of complex multivariate data. However, a key aspect is being able to extract predictions in a timely manner, to accommodate real-time requirements in different applications. In the case of deep learning models, model reduction is extremely important to achieve optimal results in real-time systems with limited time and memory constraints. In this paper, we address this issue by proposing a novel compression method for deep autoencoders that involves three key factors. First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels. Second, linear and non-linear quantization reduces model complexity by reducing the number of bits for every single weight. The combined contribution of these three aspects allow the model size to be reduced, by removing a subset of the weights (pruning), and decreasing their bit-width (quantization). As a result, the compressed model is faster and easier to adopt in highly constrained hardware environments. Experiments performed on popular multivariate anomaly detection benchmarks, show that our method is capable of achieving significant model compression ratio (between 80% and 95%) without a significant reduction in the anomaly detection performance.

Towards efficient deep autoencoders for multivariate time series anomaly detection

TL;DR

The paper addresses the need for real-time anomaly detection in multivariate time series by compressing deep autoencoder models. It introduces a three-stage workflow combining adaptive pruning with per-layer sparsity and two quantization strategies (linear and non-linear), applied to CNN- and graph-based autoencoders, followed by non-gradient fine-tuning. Empirical results on benchmarks SWAT, WADI-2019, MSL, and SMAP show compression in the range of 80–95% with dataset-dependent accuracy changes, where 16/8-bit quantization is generally robust and 4-bit quantization is viable only for some datasets. The work enables more efficient deployment of anomaly detection on edge/IoT hardware and real-time systems, highlighting the trade-offs between compression level and detection performance and pointing toward retraining-based quantization as future improvement.

Abstract

Multivariate time series anomaly detection is a crucial problem in many industrial and research applications. Timely detection of anomalies allows, for instance, to prevent defects in manufacturing processes and failures in cyberphysical systems. Deep learning methods are preferred among others for their accuracy and robustness for the analysis of complex multivariate data. However, a key aspect is being able to extract predictions in a timely manner, to accommodate real-time requirements in different applications. In the case of deep learning models, model reduction is extremely important to achieve optimal results in real-time systems with limited time and memory constraints. In this paper, we address this issue by proposing a novel compression method for deep autoencoders that involves three key factors. First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels. Second, linear and non-linear quantization reduces model complexity by reducing the number of bits for every single weight. The combined contribution of these three aspects allow the model size to be reduced, by removing a subset of the weights (pruning), and decreasing their bit-width (quantization). As a result, the compressed model is faster and easier to adopt in highly constrained hardware environments. Experiments performed on popular multivariate anomaly detection benchmarks, show that our method is capable of achieving significant model compression ratio (between 80% and 95%) without a significant reduction in the anomaly detection performance.
Paper Structure (7 sections, 16 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 7 sections, 16 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Proposed compression workflow for deep autoencoder models involving adaptive pruning and linear/nonlinear quantization stages.
  • Figure 2: Overview of our proposed adaptive pruning and quantization approach. The initial model is pruned based on fast lottery ticket search (a), and its weights are quantized to a 4-bit representation (b).