Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach
Bahareh Golchin, Banafsheh Rekabdar
TL;DR
DRSMT tackles multivariate time series anomaly detection under limited labeled data by fusing a Variational Autoencoder, an LSTM‑driven DQN, dynamic reward scaling, and an active learning loop. The VAE learns compact latent representations and provides a reconstruction‑based intrinsic reward $R_2$ that complements the classification reward $R_1$, with a dynamic coefficient $\lambda(t)$ balancing exploration and exploitation via $R_{total}=R_1+\lambda(t)R_2$. The LSTM‑DQN makes sequential binary decisions on normal vs. anomalous windows, guided by the unified reward and enriched by uncertainty‑driven labeling. Experiments on SMD and WADI demonstrate state‑of‑the‑art F1 and AU‑PR, confirming the practicality of combining generative modeling, reinforcement learning, and selective supervision for scalable industrial anomaly detection. The method shows strong precision on high‑dimensional sensor data and demonstrates how adaptive rewards and minimal labeling can reduce supervision while maintaining accuracy.
Abstract
Detecting anomalies in multivariate time series is essential for monitoring complex industrial systems, where high dimensionality, limited labeled data, and subtle dependencies between sensors cause significant challenges. This paper presents a deep reinforcement learning framework that combines a Variational Autoencoder (VAE), an LSTM-based Deep Q-Network (DQN), dynamic reward shaping, and an active learning module to address these issues in a unified learning framework. The main contribution is the implementation of Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection (DRSMT), which demonstrates how each component enhances the detection process. The VAE captures compact latent representations and reduces noise. The DQN enables adaptive, sequential anomaly classification, and the dynamic reward shaping balances exploration and exploitation during training by adjusting the importance of reconstruction and classification signals. In addition, active learning identifies the most uncertain samples for labeling, reducing the need for extensive manual supervision. Experiments on two multivariate benchmarks, namely Server Machine Dataset (SMD) and Water Distribution Testbed (WADI), show that the proposed method outperforms existing baselines in F1-score and AU-PR. These results highlight the effectiveness of combining generative modeling, reinforcement learning, and selective supervision for accurate and scalable anomaly detection in real-world multivariate systems.
