Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach
Alaa Selim, Yanzhu Ye, Junbo Zhao, Bo Yang
TL;DR
The paper tackles scalable Volt-VAR optimization (VVO) in distribution networks with high DER penetration by deploying RLlib-IMPALA on the RAY platform to enable distributed, fast training for high-dimensional control tasks. It introduces an optimal DER placement method for PV and battery resources on the IEEE 123-bus system, and an IMPALA-based centralized control framework that handles continuous and discrete DER actions with state $s_t=[V_1,\dots,V_N, D_1,\dots,D_M]^\top$ and reward $r(s_t,a_t)=-V_{\text{vio}}$, aided by off-policy corrections via $\rho_t$ and V-trace. The results show faster convergence and higher rewards compared with SAC and PPO, with substantial reductions in computation time, while highlighting practical limits related to core usage on single machines. The work has significant implications for real-time, scalable VVO in modern grids and paves the way for applying DRL to even larger networks and more complex DER deployments.
Abstract
In the rapidly evolving domain of electrical power systems, the Volt-VAR optimization (VVO) is increasingly critical, especially with the burgeoning integration of renewable energy sources. Traditional approaches to learning-based VVO in expansive and dynamically changing power systems are often hindered by computational complexities. To address this challenge, our research presents a novel framework that harnesses the potential of Deep Reinforcement Learning (DRL), specifically utilizing the Importance Weighted Actor-Learner Architecture (IMPALA) algorithm, executed on the RAY platform. This framework, built upon RLlib-an industry-standard in Reinforcement Learning-ingeniously capitalizes on the distributed computing capabilities and advanced hyperparameter tuning offered by RAY. This design significantly expedites the exploration and exploitation phases in the VVO solution space. Our empirical results demonstrate that our approach not only surpasses existing DRL methods in achieving superior reward outcomes but also manifests a remarkable tenfold reduction in computational requirements. The integration of our DRL agent with the RAY platform facilitates the creation of RLlib-IMPALA, a novel framework that efficiently uses RAY's resources to improve system adaptability and control. RLlib-IMPALA leverages RAY's toolkit to enhance analytical capabilities and significantly speeds up training to become more than 10 times faster than other state-of-the-art DRL methods.
