LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Time Series Anomaly Detection
Feiyi Chen, Zhen Qin, Yingying Zhang, Shuiguang Deng, Yi Xiao, Guansong Pang, Qingsong Wen
TL;DR
This work addresses the challenge of rapidly evolving normal patterns in web-service time series while avoiding costly full retraining of deep VAEs. It introduces LARA, a light, anti-overfitting retraining approach that casts retraining as a convex optimization, uses a ruminate block to exploit historical model knowledge without storing data, and employs simple linear adjustments to latent vectors and reconstructions ($M_z$, $M_x$) with provable optimality under Gaussian assumptions. The method delivers strong anomaly-detection performance with only small amounts of new data (as few as 43 time slots) and exhibits low time and memory overhead, outperforming or matching state-of-the-art baselines. The combination of convex optimization, history-aware guidance, and lightweight adjustment yields a practical, scalable solution for continuous, data-efficient adaptation of time-series anomaly detectors in dynamic environments.
Abstract
Most of current anomaly detection models assume that the normal pattern remains same all the time. However, the normal patterns of Web services change dramatically and frequently. The model trained on old-distribution data is outdated after such changes. Retraining the whole model every time is expensive. Besides, at the beginning of normal pattern changes, there is not enough observation data from the new distribution. Retraining a large neural network model with limited data is vulnerable to overfitting. Thus, we propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs). This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones. Moreover, we have performed many experiments to verify that retraining LARA with even 43 time slots of data from new distribution can result in its competitive F1 Score in comparison with the state-of-the-art anomaly detection models trained with sufficient data. Besides, we verify its light overhead.
