Make Safe Decisions in Power System: Safe Reinforcement Learning Based Pre-decision Making for Voltage Stability Emergency Control
Congbo Bi, Lipeng Zhu, Di Liu, Chao Lu
TL;DR
This work tackles safe pre-decision making for short-term voltage stability under high renewable penetration by formulating emergency control as a state-constrained MDP (SCMDP) and introducing a safe reinforcement learning (SRL) framework. It integrates a neural security margin estimator with a two-stage decision mechanism: a dueling, state-action joint network learned with active learning, and a gradient projection-based correction that enforces a margin constraint $D_{\theta}(O_t,a_t) \ge \epsilon$. The approach yields theoretical security guarantees, improved training efficiency, and robust performance on the IEEE 39-bus system and a realistic Guangdong grid, achieving fewer violations and competitive control effort compared with baseline methods. The combination of safety-aware learning, margin-based action correction, and AL-enabled scalability offers a practical path toward reliable, data-driven emergency control in modern power systems.
Abstract
The high penetration of renewable energy and power electronic equipment bring significant challenges to the efficient construction of adaptive emergency control strategies against various presumed contingencies in today's power systems. Traditional model-based emergency control methods have difficulty in adapt well to various complicated operating conditions in practice. Fr emerging artificial intelligence-based approaches, i.e., reinforcement learning-enabled solutions, they are yet to provide solid safety assurances under strict constraints in practical power systems. To address these research gaps, this paper develops a safe reinforcement learning (SRL)-based pre-decision making framework against short-term voltage collapse. Our proposed framework employs neural networks for pre-decision formulation, security margin estimation, and corrective action implementation, without reliance on precise system parameters. Leveraging the gradient projection, we propose a security projecting correction algorithm that offers theoretical security assurances to amend risky actions. The applicability of the algorithm is further enhanced through the incorporation of active learning, which expedites the training process and improves security estimation accuracy. Extensive numerical tests on the New England 39-bus system and the realistic Guangdong Provincal Power Grid demonstrate the effectiveness of the proposed framework.
