Table of Contents
Fetching ...

Safe Reinforcement Learning for Power System Control: A Review

Peipei Yu, Zhenyi Wang, Hongcai Zhang, Yonghua Song

TL;DR

This review surveys safe reinforcement learning for power system control, addressing the safety gap that prevents deploying traditional RL in real-world grids with high renewable penetration. It classifies safe RL into two main approaches: adding a safe layer to constrain actions and transforming policy optimization via constrained MDPs, including methods such as CBF, MPC, Lagrange multipliers, TRPO/CPO, and Lyapunov-based techniques. The paper systematically applies these methods to frequency regulation, voltage control, and energy management, detailing state/action spaces, rewards/costs, and safety mechanisms, and discusses challenges in convergence, efficiency, universality, and deployment. It concludes with perspectives on sim-to-real transfer, digital twins, and hybrid model-based/model-free strategies as practical avenues toward safer, real-world RL-enabled power systems.

Abstract

The large-scale integration of intermittent renewable energy resources introduces increased uncertainty and volatility to the supply side of power systems, thereby complicating system operation and control. Recently, data-driven approaches, particularly reinforcement learning (RL), have shown significant promise in addressing complex control challenges in power systems, because RL can learn from interactive feedback without needing prior knowledge of the system model. However, the training process of model-free RL methods relies heavily on random decisions for exploration, which may result in ``bad" decisions that violate critical safety constraints and lead to catastrophic control outcomes. Due to the inability of RL methods to theoretically ensure decision safety in power systems, directly deploying traditional RL algorithms in the real world is deemed unacceptable. Consequently, the safety issue in RL applications, known as safe RL, has garnered considerable attention in recent years, leading to numerous important developments. This paper provides a comprehensive review of the state-of-the-art safe RL techniques and discusses how these techniques can be applied to power system control problems such as frequency regulation, voltage control, and energy management. We then present discussions on key challenges and future research directions, related to convergence and optimality, training efficiency, universality, and real-world deployment.

Safe Reinforcement Learning for Power System Control: A Review

TL;DR

This review surveys safe reinforcement learning for power system control, addressing the safety gap that prevents deploying traditional RL in real-world grids with high renewable penetration. It classifies safe RL into two main approaches: adding a safe layer to constrain actions and transforming policy optimization via constrained MDPs, including methods such as CBF, MPC, Lagrange multipliers, TRPO/CPO, and Lyapunov-based techniques. The paper systematically applies these methods to frequency regulation, voltage control, and energy management, detailing state/action spaces, rewards/costs, and safety mechanisms, and discusses challenges in convergence, efficiency, universality, and deployment. It concludes with perspectives on sim-to-real transfer, digital twins, and hybrid model-based/model-free strategies as practical avenues toward safer, real-world RL-enabled power systems.

Abstract

The large-scale integration of intermittent renewable energy resources introduces increased uncertainty and volatility to the supply side of power systems, thereby complicating system operation and control. Recently, data-driven approaches, particularly reinforcement learning (RL), have shown significant promise in addressing complex control challenges in power systems, because RL can learn from interactive feedback without needing prior knowledge of the system model. However, the training process of model-free RL methods relies heavily on random decisions for exploration, which may result in ``bad" decisions that violate critical safety constraints and lead to catastrophic control outcomes. Due to the inability of RL methods to theoretically ensure decision safety in power systems, directly deploying traditional RL algorithms in the real world is deemed unacceptable. Consequently, the safety issue in RL applications, known as safe RL, has garnered considerable attention in recent years, leading to numerous important developments. This paper provides a comprehensive review of the state-of-the-art safe RL techniques and discusses how these techniques can be applied to power system control problems such as frequency regulation, voltage control, and energy management. We then present discussions on key challenges and future research directions, related to convergence and optimality, training efficiency, universality, and real-world deployment.
Paper Structure (35 sections, 31 equations, 6 figures, 3 tables)

This paper contains 35 sections, 31 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The structure of safe RL methodology.
  • Figure 2: Illustration of a Markov Decision Process.
  • Figure 3: Combination scheme of the safe layer and RL. At (1), the safe layer intercepts unsafe actions to become safe ones. At (2), the safe layer modifies the reward based on the interception degree, returning effective feedback.
  • Figure 4: Concept and differences between the action projection and action replacement. Here, action replacement (a) replaces unsafe actions with self-defined actions from the safe action space. Action projection (b) projects the agent's unsafe actions to the closest action in the safe action space.
  • Figure 5: The safety-guided RL framework for safe explorations.
  • ...and 1 more figures