Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control
Eloy Anguiano Batanero, Ángela Fernández, Álvaro Barbero
TL;DR
The paper tackles autonomous topology control of power grids under uncertainty using a model-free reinforcement learning framework. It introduces a masked topological action space and graph-based observations, integrated with PPO-style actor-critic training and a GraphTransformer encoder to capture grid topology. Key contributions include the Topological Action Converter (TAC), three observation formulations (Flat, SubstationGraph, ElementGraph) with graph-based encoders, and opponent-based training that improves stability and generalization, evidenced by reduced power losses and strong L2RPN scores across 20 chronics in a 5-substation Grid2Op environment. Agents demonstrated robust performance on unseen chronics (17 and 19) and achieved competitive cost-savings, indicating the approach's potential as a scalable, foundational framework for autonomous grid management. The work suggests pathways to extend to larger grids, richer reward components (e.g., storage and renewable prioritization), and a directly mappable ElementGraph capable of multi-topology control, moving toward a foundational model for power-grid operation.
Abstract
The increasing complexity of power grid management, driven by the emergence of prosumers and the demand for cleaner energy solutions, has needed innovative approaches to ensure stability and efficiency. This paper presents a novel approach within the model-free framework of reinforcement learning, aimed at optimizing power network operations without prior expert knowledge. We introduce a masked topological action space, enabling agents to explore diverse strategies for cost reduction while maintaining reliable service using the state logic as a guide for choosing proper actions. Through extensive experimentation across 20 different scenarios in a simulated 5-substation environment, we demonstrate that our approach achieves a consistent reduction in power losses, while ensuring grid stability against potential blackouts. The results underscore the effectiveness of combining dynamic observation formalization with opponent-based training, showing a viable way for autonomous management solutions in modern energy systems or even for building a foundational model for this field.
