Reinforcement Learning for Distributed Transient Frequency Control with Stability and Safety Guarantees

Zhenyi Yuan; Changhong Zhao; Jorge Cortes

Reinforcement Learning for Distributed Transient Frequency Control with Stability and Safety Guarantees

Zhenyi Yuan, Changhong Zhao, Jorge Cortes

TL;DR

The paper tackles transient frequency control in power networks with disturbances, aiming to maintain frequencies within safe bounds while ensuring asymptotic stability. It combines Lyapunov-based safety constraints with reinforcement learning, introducing a distributed dynamic budget mechanism that yields a provably safe and flexible policy search space. Neural networks parameterize class-$\mathcal{K}$ functions and frequency thresholds, and an RNN-based RL algorithm learns the optimal distributed policy (RLb) within this space. Case studies on the IEEE 39-bus network demonstrate guaranteed stability and transient safety, with improved optimality and robustness compared to baselines, highlighting the practical impact for secure, data-driven grid operation.

Abstract

This paper proposes a reinforcement learning-based approach for optimal transient frequency control in power systems with stability and safety guarantees. Building on Lyapunov stability theory and safety-critical control, we derive sufficient conditions on the distributed controller design that ensure the stability and transient frequency safety of the closed-loop system. Our idea of distributed dynamic budget assignment makes these conditions less conservative than those in recent literature, so that they can impose less stringent restrictions on the search space of control policies. We construct neural network controllers that parameterize such control policies and use reinforcement learning to train an optimal one. Simulations on the IEEE 39-bus network illustrate the guaranteed stability and safety properties of the controller along with its significantly improved optimality.

Reinforcement Learning for Distributed Transient Frequency Control with Stability and Safety Guarantees

TL;DR

functions and frequency thresholds, and an RNN-based RL algorithm learns the optimal distributed policy (RLb) within this space. Case studies on the IEEE 39-bus network demonstrate guaranteed stability and transient safety, with improved optimality and robustness compared to baselines, highlighting the practical impact for secure, data-driven grid operation.

Abstract

Paper Structure (19 sections, 8 theorems, 21 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 8 theorems, 21 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Notations
Graph theory
Power network dynamics
Problem Formulation
Search Space of Control Policies
Constraint ensuring frequency invariance
Constraint ensuring asymptotic stability
Distributed dynamic budget assignment
Distributed, stable and safe control policies
Synthesis of Distributed Neural Network Controllers
Selecting class-$\mathcal{K}$ functions and frequency thresholds
Neural network controller design
Learning optimal control policy using RNN
...and 4 more sections

Key Result

Lemma 4.1

(Sufficient condition for frequency invariance YZ-JC:19-auto): Assume the solution of eq:newswing exists and is unique for every admissible initial condition. For each $i\in\mathcal{I}$, let $\overline\omega_{i}^{\operatorname{th}}, \underline\omega_{i}^{\operatorname{th}}\in\mathbb{R}$ be such th where $q_{i}(x, p) \triangleq D_{i} \omega_{i} + \left[B Y_{b}\right]_{i} \sin \lambda - p_{i}$, th

Figures (6)

Figure 1: The colored region shows the search space for the controllers satisfying by \ref{['frequency-controller-constraint']}, cf. Theorem \ref{['thm:frequency-controller']}, which ensures asymptotic stability and transient safety. The orange curve is an instance of a controller in the specified search space. The sign of the budget captures whether bus $i$ (a) violates \ref{['eq:simple-stability-condition']} or (b) compensates it up to a certain amount to ensure the overall system stability.
Figure 2: Comparison of average training loss curves and shaded error bars (representing the standard deviations) between the RL-based method in WC-YJ-BZ:22 and the proposed RLb method based on 5 experiments. The latter has a warmer training start. Each method solves a different discrete-time optimization problem, which explains the convergence to different optimal values.
Figure 3: Dynamics of the IEEE 39-bus network under the trained controllers via the RL-based method in WC-YJ-BZ:22 (top), the controllers in YZ-JC:19-auto (middle), and the trained controllers via the RLb method in this paper (bottom).
Figure 4: Dynamics of the IEEE 39-bus network under the trained controllers via the RLb method when only partial information about the sudden change on power injection is known in the training process. In this case, any of the buses 32, 33, 35 and 38 may encounter a sudden change on power injection.
Figure 5: Budget allocation under the proposed dynamic budget assignment mechanism for the RLb method.
...and 1 more figures

Theorems & Definitions (12)

Lemma 4.1
Lemma 4.2
proof
Proposition 4.3
proof
Theorem 4.4
proof
Lemma 5.1
Lemma 5.2
Lemma 5.3
...and 2 more

Reinforcement Learning for Distributed Transient Frequency Control with Stability and Safety Guarantees

TL;DR

Abstract

Reinforcement Learning for Distributed Transient Frequency Control with Stability and Safety Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)