Table of Contents
Fetching ...

LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Time Series Anomaly Detection

Feiyi Chen, Zhen Qin, Yingying Zhang, Shuiguang Deng, Yi Xiao, Guansong Pang, Qingsong Wen

TL;DR

This work addresses the challenge of rapidly evolving normal patterns in web-service time series while avoiding costly full retraining of deep VAEs. It introduces LARA, a light, anti-overfitting retraining approach that casts retraining as a convex optimization, uses a ruminate block to exploit historical model knowledge without storing data, and employs simple linear adjustments to latent vectors and reconstructions ($M_z$, $M_x$) with provable optimality under Gaussian assumptions. The method delivers strong anomaly-detection performance with only small amounts of new data (as few as 43 time slots) and exhibits low time and memory overhead, outperforming or matching state-of-the-art baselines. The combination of convex optimization, history-aware guidance, and lightweight adjustment yields a practical, scalable solution for continuous, data-efficient adaptation of time-series anomaly detectors in dynamic environments.

Abstract

Most of current anomaly detection models assume that the normal pattern remains same all the time. However, the normal patterns of Web services change dramatically and frequently. The model trained on old-distribution data is outdated after such changes. Retraining the whole model every time is expensive. Besides, at the beginning of normal pattern changes, there is not enough observation data from the new distribution. Retraining a large neural network model with limited data is vulnerable to overfitting. Thus, we propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs). This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones. Moreover, we have performed many experiments to verify that retraining LARA with even 43 time slots of data from new distribution can result in its competitive F1 Score in comparison with the state-of-the-art anomaly detection models trained with sufficient data. Besides, we verify its light overhead.

LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Time Series Anomaly Detection

TL;DR

This work addresses the challenge of rapidly evolving normal patterns in web-service time series while avoiding costly full retraining of deep VAEs. It introduces LARA, a light, anti-overfitting retraining approach that casts retraining as a convex optimization, uses a ruminate block to exploit historical model knowledge without storing data, and employs simple linear adjustments to latent vectors and reconstructions (, ) with provable optimality under Gaussian assumptions. The method delivers strong anomaly-detection performance with only small amounts of new data (as few as 43 time slots) and exhibits low time and memory overhead, outperforming or matching state-of-the-art baselines. The combination of convex optimization, history-aware guidance, and lightweight adjustment yields a practical, scalable solution for continuous, data-efficient adaptation of time-series anomaly detectors in dynamic environments.

Abstract

Most of current anomaly detection models assume that the normal pattern remains same all the time. However, the normal patterns of Web services change dramatically and frequently. The model trained on old-distribution data is outdated after such changes. Retraining the whole model every time is expensive. Besides, at the beginning of normal pattern changes, there is not enough observation data from the new distribution. Retraining a large neural network model with limited data is vulnerable to overfitting. Thus, we propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs). This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones. Moreover, we have performed many experiments to verify that retraining LARA with even 43 time slots of data from new distribution can result in its competitive F1 Score in comparison with the state-of-the-art anomaly detection models trained with sufficient data. Besides, we verify its light overhead.
Paper Structure (27 sections, 11 equations, 3 figures, 8 tables)

This paper contains 27 sections, 11 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: LARA vs. the other two approaches. The figures show (a) reconstructed data samples and (b) latent vectors output by three different approaches: the model trained on historical data only (i.e. outdated model), the model retrained by the whole dataset, and the model retrained by LARA.
  • Figure 2: Overview of LARA. When there is a new distribution shift, LARA retrieves historical data from the latest model and uses them with a few newly observed data sample to estimate the latent vector for each new sample by the ruminate block. Then, LARA uses two adjusting functions -- $M_z$ and $M_x$ -- to adapt the latent vector to the estimated one by the ruminate block, and adapt the reconstructed sample yielded by the latest model to the sample from the new distribution.
  • Figure 3: Due to space constraints, we use the first two letters as the shorthand for each method. (a) As the memory overhead of JumpStarter, AnomalyTransformer, and MSCRED are dramatically larger than the others, to show the memory overhead of other methods clearly, we divide their memory overhead by 10. (b) The ratios of retraining memory and time overhead to training memory and time overhead. (d) The ratio of loss to iteration count varies with the number of iteration count. (e) The x-label is the proportion of retraining data in new distribution data. (f) In the legend, we use pre, rec, F1 to denote precision, recall and F1 score before retraining and use pre$^*$, rec$^*$, F1$^*$ to denote them after retraining.