Optimizing Power Grid Topologies with Reinforcement Learning: A Survey of Methods and Challenges
Erica van der Sar, Alessandro Zocca, Sandjai Bhulai
TL;DR
The paper surveys reinforcement learning approaches for power grid topology optimization, with a focus on the L2RPN benchmark suite and Grid2Op environments. It systematically categorizes design choices (algorithms, action/observation spaces, rewards, curricula, and rule-based heuristics) and provides a comparative numerical study on a standard 14-bus sandbox, highlighting when action-space reduction, robust rules, and high activation thresholds improve training and performance. Key findings indicate that while PPO-based agents can approach or exceed baselines under carefully designed reductions and curricula, theGreedy baseline often remains a strong competitor, underscoring the need for robust, scalable, and generalizable RL strategies. The authors advocate future work in imitation learning, off-policy methods, graph neural networks, and model-based planning to bridge the gap to real-world grid optimization and larger grids, offering practical guidance for researchers and operators. The work thus lays a foundation for advancing RL-driven power grid optimization by detailing open challenges, benchmarking practices, and concrete recommendations for future experiments.
Abstract
Power grid operation is becoming increasingly complex due to the rising integration of renewable energy sources and the need for more adaptive control strategies. Reinforcement Learning (RL) has emerged as a promising approach to power network control (PNC), offering the potential to enhance decision-making in dynamic and uncertain environments. The Learning To Run a Power Network (L2RPN) competitions have played a key role in accelerating research by providing standardized benchmarks and problem formulations, leading to rapid advancements in RL-based methods. This survey provides a comprehensive and structured overview of RL applications for power grid topology optimization, categorizing existing techniques, highlighting key design choices, and identifying gaps in current research. Additionally, we present a comparative numerical study evaluating the impact of commonly applied RL-based methods, offering insights into their practical effectiveness. By consolidating existing research and outlining open challenges, this survey aims to provide a foundation for future advancements in RL-driven power grid optimization.
