Table of Contents
Fetching ...

Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control

Eloy Anguiano Batanero, Ángela Fernández, Álvaro Barbero

TL;DR

The paper tackles autonomous topology control of power grids under uncertainty using a model-free reinforcement learning framework. It introduces a masked topological action space and graph-based observations, integrated with PPO-style actor-critic training and a GraphTransformer encoder to capture grid topology. Key contributions include the Topological Action Converter (TAC), three observation formulations (Flat, SubstationGraph, ElementGraph) with graph-based encoders, and opponent-based training that improves stability and generalization, evidenced by reduced power losses and strong L2RPN scores across 20 chronics in a 5-substation Grid2Op environment. Agents demonstrated robust performance on unseen chronics (17 and 19) and achieved competitive cost-savings, indicating the approach's potential as a scalable, foundational framework for autonomous grid management. The work suggests pathways to extend to larger grids, richer reward components (e.g., storage and renewable prioritization), and a directly mappable ElementGraph capable of multi-topology control, moving toward a foundational model for power-grid operation.

Abstract

The increasing complexity of power grid management, driven by the emergence of prosumers and the demand for cleaner energy solutions, has needed innovative approaches to ensure stability and efficiency. This paper presents a novel approach within the model-free framework of reinforcement learning, aimed at optimizing power network operations without prior expert knowledge. We introduce a masked topological action space, enabling agents to explore diverse strategies for cost reduction while maintaining reliable service using the state logic as a guide for choosing proper actions. Through extensive experimentation across 20 different scenarios in a simulated 5-substation environment, we demonstrate that our approach achieves a consistent reduction in power losses, while ensuring grid stability against potential blackouts. The results underscore the effectiveness of combining dynamic observation formalization with opponent-based training, showing a viable way for autonomous management solutions in modern energy systems or even for building a foundational model for this field.

Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control

TL;DR

The paper tackles autonomous topology control of power grids under uncertainty using a model-free reinforcement learning framework. It introduces a masked topological action space and graph-based observations, integrated with PPO-style actor-critic training and a GraphTransformer encoder to capture grid topology. Key contributions include the Topological Action Converter (TAC), three observation formulations (Flat, SubstationGraph, ElementGraph) with graph-based encoders, and opponent-based training that improves stability and generalization, evidenced by reduced power losses and strong L2RPN scores across 20 chronics in a 5-substation Grid2Op environment. Agents demonstrated robust performance on unseen chronics (17 and 19) and achieved competitive cost-savings, indicating the approach's potential as a scalable, foundational framework for autonomous grid management. The work suggests pathways to extend to larger grids, richer reward components (e.g., storage and renewable prioritization), and a directly mappable ElementGraph capable of multi-topology control, moving toward a foundational model for power-grid operation.

Abstract

The increasing complexity of power grid management, driven by the emergence of prosumers and the demand for cleaner energy solutions, has needed innovative approaches to ensure stability and efficiency. This paper presents a novel approach within the model-free framework of reinforcement learning, aimed at optimizing power network operations without prior expert knowledge. We introduce a masked topological action space, enabling agents to explore diverse strategies for cost reduction while maintaining reliable service using the state logic as a guide for choosing proper actions. Through extensive experimentation across 20 different scenarios in a simulated 5-substation environment, we demonstrate that our approach achieves a consistent reduction in power losses, while ensuring grid stability against potential blackouts. The results underscore the effectiveness of combining dynamic observation formalization with opponent-based training, showing a viable way for autonomous management solutions in modern energy systems or even for building a foundational model for this field.

Paper Structure

This paper contains 27 sections, 13 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Illustrative example on how a powergrid can be formalized as a dynamic graph with a variable number of nodes, since each unique connection of each substation would be seen as a node of the network. Thus, when performing certain actions on the network, it is possible for a substation to correspond to more than one node as a previously inactive bus would come into play. This figure is extracted from Dorfer2022PowerGC.
  • Figure 2: Illustrative comparison of an observation without formalization (a powergrid state) of 5 substations against its equivalent in Element Graph form. It can be seen how each element always exists in the graph, as well as how all elements share an edge with each element to which it could be connected.
  • Figure 3: Graph Transformer architecture proposed by GraphTransformer, highlighting its integration of neighborhood connectivity, Laplacian eigenvector positional encodings, and edge feature representation to enhance graph-based learning tasks.
  • Figure 4: In this figure it is shown how the PPO algorithm with masks from Section \ref{['subsubsec:ppo']} works along with the TAC. This algorithm uses the observation vectorizer to optimize both the value function and the agent's policy since it is an algorithm of the Actor-Critic family. The action of the policy $\pi_{\theta}$ would be the final action on the image while the VF estimation is the output $V_{\phi}$ of the Critic that tries to estimate the incoming rewards from now on. On the other hand, it can be seen how after the decision of the agent's policy, the masking of actions allows us to avoid the appearance of illegal actions taking into account the logic and rules of the current observation by forcing the dimension assigned to do nothing on that element on masked ones.
  • Figure 5: Rewards obtained at training chronics throughout the training of each agent configuration. Each proposed approach was run 5 times and the bands show the standard deviation of that approach. The left plot shows agents trained without an opponent, while the right plot presents results when the opponent is active during training.
  • ...and 6 more figures