Graph Reinforcement Learning for Power Grids: A Comprehensive Survey

Mohamed Hassouna; Clara Holzhüter; Pawel Lytaev; Josephine Thomas; Bernhard Sick; Christoph Scholz

Graph Reinforcement Learning for Power Grids: A Comprehensive Survey

Mohamed Hassouna, Clara Holzhüter, Pawel Lytaev, Josephine Thomas, Bernhard Sick, Christoph Scholz

TL;DR

This survey surveys the emerging field of Graph Reinforcement Learning (GRL) for power grids, arguing that integrating Graph Neural Networks with reinforcement learning yields scalable, adaptable controllers for transmission and distribution networks amid growing renewable penetration. It categorizes approaches by use case (transmission topology control and distribution voltage management), RL paradigm (model-free, model-based, hierarchical, multi-agent, and imitation-learning hybrids), and graph representations, emphasizing the role of GNNs as state encoders, world models, or planning facilitators. The review highlights core methodological trends, such as shifting from monolithic agents to hierarchical/multi-agent designs, the prevalence of attention-based GNNs, and the integration of physics-informed losses, while noting the field’s dependence on Grid2Op-based simulations and small test cases. It further discusses open challenges—simulation-to-reality gaps, lack of standardized evaluation, need for consolidation of heterogeneous GRL techniques, and safety/trust considerations—arguing that progress toward real-world deployment will require open data, robust benchmarks, and human-centered, safe decision-support frameworks. Overall, GRL holds promise for real-time, robust grid control in decarbonized power systems, but realizable deployment awaits advances in realism, standardization, and trustworthy integration with operator workflows.

Abstract

The increasing share of renewable energy and distributed electricity generation requires the development of deep learning approaches to address the lack of flexibility inherent in traditional power grid methods. In this context, Graph Neural Networks are a promising solution due to their ability to learn from graph-structured data. Combined with Reinforcement Learning, they can be used as control approaches to determine remedial actions. This review analyses how Graph Reinforcement Learning can improve representation learning and decision-making in power grid applications, particularly transmission and distribution grids. We analyze the reviewed approaches in terms of the graph structure, the Graph Neural Network architecture, and the Reinforcement Learning approach. Although Graph Reinforcement Learning has demonstrated adaptability to unpredictable events and noisy data, its current stage is primarily proof-of-concept, and it is not yet deployable to real-world applications. We highlight the open challenges and limitations for real-world applications.

Graph Reinforcement Learning for Power Grids: A Comprehensive Survey

TL;DR

Abstract

Paper Structure (85 sections, 11 equations, 6 figures, 7 tables)

This paper contains 85 sections, 11 equations, 6 figures, 7 tables.

Introduction
Existing Works
Contribution and Structure
Fundamentals: Power Grids, Graph Neural Networks and Reinforcement Learning
Power Grids
Transmission grids.
Distribution grids.
Grid models.
Graph Neural Networks
Message Passing
Spatial Graph Convolution.
GraphSage.
Graph Attention Network
Spectral Graph Convolution.
Graph Capsule Networks.
...and 70 more sections

Figures (6)

Figure 1: Visualization of the power grid structure with transmission and distribution level.
Figure 2: Left: Visualization of the general message passing scheme in GNN (modeled after bronstein2021geometric) - The target node (orange) receives messages $m_ui$ from its neighbors and aggregates them. The messages can be constructed from the information of both the target and neighboring node, depending on the message passing scheme. Right: Illustration of a GNN (modeled after wu2020comprehensive) - The graph is input to the GNN layers, which compute node embeddings based on the messages from neighboring nodes. As indicated in orange, this is done for each node in the graph. After all embeddings are computed, an activation function is applied. This is repeated for a given number of layers. In the end, the GNN outputs a graph with new node embeddings from which a prediction can be made.
Figure 3: The agent-environment interaction is a cyclical process where the agent selects actions based on the current state, leading to state transitions and rewards, guided by a policy $\pi$, hence generating a sequence of states, actions, and rewards.
Figure 4: Transformation from the physical power grid to the graph input for the GNN. Each grid component — loads, generators, and both ends of transmission lines — is represented as a node. Edges are defined by the grid’s physical connectivity, linking nodes within substations and across substations via transmission lines.
Figure 5: Illustration of the Logical Flow of GRL for Grid Operation: First, the power grid, including relevant information about lines and grid nodes, is modeled as a graph with node and edge features. This graph is then input into a GNN model, which learns a representation of the grid. This representation serves as an observation for the agent. Based on this observation, the agent selects an action from the action space. This may include simulations or other verification strategies to validate the action. The final action is executed in the environment (i.e., the simulated power grid). The agent receives a reward corresponding to the quality of the action and updates its weights. Depending on the RL algorithms employed, multiple (Graph) Neural Networks must be updated; for example, in actor-critic approaches.
...and 1 more figures

Graph Reinforcement Learning for Power Grids: A Comprehensive Survey

TL;DR

Abstract

Graph Reinforcement Learning for Power Grids: A Comprehensive Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (6)