Reinforcement Learning Optimizes Power Dispatch in Decentralized Power Grid

Yongsun Lee; Hoyun Choi; Laurent Pagnier; Cook Hyun Kim; Jongshin Lee; Bukyoung Jhun; Heetae Kim; Juergen Kurths; B. Kahng

Reinforcement Learning Optimizes Power Dispatch in Decentralized Power Grid

Yongsun Lee, Hoyun Choi, Laurent Pagnier, Cook Hyun Kim, Jongshin Lee, Bukyoung Jhun, Heetae Kim, Juergen Kurths, B. Kahng

TL;DR

The paper addresses frequency stability in decentralized power grids with high renewable penetration by introducing GC-PPO, a graph-convolutional proximal policy optimization framework that outputs a distributed dispatch plan to minimize frequency fluctuations modeled by the swing equation. It combines graph neural networks with PPO to produce bus-level dispatch fractions $\delta P_{ji}$ in response to perturbations, and demonstrates superior performance over heuristic methods on SHK-like synthetic grids and a Kron-reduced UK grid, with stability measured by the fluctuation metric $\Xi$. The study highlights the importance of topology-aware, inertia-weighted dispatch in heterogeneous grids and discusses extensions to topology switching and proactive fault handling, offering a scalable path toward robust, decentralized frequency control. The approach provides a practical framework for rapid, topology-adaptive stabilization in modern grids with distributed renewable generation, potentially reducing blackout risk and improving reliability.

Abstract

Effective frequency control in power grids has become increasingly important with the increasing demand for renewable energy sources. Here, we propose a novel strategy for resolving this challenge using graph convolutional proximal policy optimization (GC-PPO). The GC-PPO method can optimally determine how much power individual buses dispatch to reduce frequency fluctuations across a power grid. We demonstrate its efficacy in controlling disturbances by applying the GC-PPO to the power grid of the UK. The performance of GC-PPO is outstanding compared to the classical methods. This result highlights the promising role of GC-PPO in enhancing the stability and reliability of power systems by switching lines or decentralizing grid topology.

Reinforcement Learning Optimizes Power Dispatch in Decentralized Power Grid

TL;DR

in response to perturbations, and demonstrates superior performance over heuristic methods on SHK-like synthetic grids and a Kron-reduced UK grid, with stability measured by the fluctuation metric

. The study highlights the importance of topology-aware, inertia-weighted dispatch in heterogeneous grids and discusses extensions to topology switching and proactive fault handling, offering a scalable path toward robust, decentralized frequency control. The approach provides a practical framework for rapid, topology-adaptive stabilization in modern grids with distributed renewable generation, potentially reducing blackout risk and improving reliability.

Abstract

Paper Structure (11 sections, 6 equations, 6 figures, 1 table)

This paper contains 11 sections, 6 equations, 6 figures, 1 table.

Introduction
Results
Swing equation of oscillators in the power grid
Network models
Bus-based power dispatch
Fluctuation measure
Training of GC-PPO
Discussion
Methods
Heuristics methods
GC-PPO protocol

Figures (6)

Figure 1: (a) Power grid of the high-voltage transmission lines in the UK coarse-grained by the Kron reduction method. Buses are composed of generators (Red diamond) with $P_i >0$ and consumers (blue circle) with $P_i<0$. (b) The frequency relaxation pattern of each bus after bus 3 (yellow diamond) is perturbed, and no power dispatch is applied. The frequencies for the other buses are drawn in gray. (c)$-$(e) Fluctuation relaxation pattern when the amount of power dispatch is determined by three different protocols: (c) Degree, (d) Fiedler, and (e) GC-PPO. Here are plots of the (f) power, (g) mass (inertia), and (h) damping coefficient versus bus indices of the UK grid. (i)$-$(k) Amount of power dispatch generated from bus $j$ when bus $i=3$ is perturbed, i.e., $\delta P_{j, i=3}$ for the three different protocols. (l)$-$(n) Similar to plots (i)$-$(k), but when bus 23, which is located at the center of the grid, is perturbed.
Figure 2: (a) Topology of the synthetic SHK grid, composed of generators (Red diamond) with $P_i=1$ and consumers (blue circle) with $P_i=-1$. (b) Frequency evolution of the perturbed bus (Yellow line, denoted by Yellow diamond in (a)) and the other buses (gray lines) without power dispatch. (c)$-$(e) Fluctuation relaxation pattern when power dispatch is performed following three protocols: (c) Degree, (d) Fiedler, and (e) GC-PPO.
Figure 3: Fluctuation measure $\Xi_i$ versus generator index $i$ in the UK grid for three protocols: Degree, Uniform, and GC-PPO. Degree and Uniform protocols perform poorly on three generators (4, 10, and 52), while GC-PPO effectively moderates the fluctuations.
Figure 4: Performance of GC-PPO versus the number of training episodes in different environments for the different power grids and bus types. The solid yellow curves indicate the average performance of GC-PPO, whereas the gray dots represent the fluctuations due to the stochastic training process. The dashed lines are the performances of the other topology-based protocols for reference.
Figure 5: Training of the GC-PPO when multiple consumers simultaneously overuse power. Solid and dashed lines are equivalent to Fig. \ref{['fig:training']}.
...and 1 more figures

Reinforcement Learning Optimizes Power Dispatch in Decentralized Power Grid

TL;DR

Abstract

Reinforcement Learning Optimizes Power Dispatch in Decentralized Power Grid

Authors

TL;DR

Abstract

Table of Contents

Figures (6)