Centrally Coordinated Multi-Agent Reinforcement Learning for Power Grid Topology Control
Barbera de Mol, Davide Barbieri, Jan Viebahn, Davide Grossi
TL;DR
The paper tackles the combinatorial challenge of power grid topology control under rising renewable integration by introducing a centrally coordinated multi-agent reinforcement learning (CCMA) framework that factorizes the action space. Regional agents propose topological changes and a coordinating agent selects the final action, enabling scalable, sample-efficient learning in larger networks. Across 5- and 14-bus grids (with stochastic outages) and preliminary 36-bus experiments, CCMA variants outperform baselines in both learning efficiency and final performance, with Greedy-RL often delivering the best results. The work demonstrates the practical potential of combining HRL and MARL for robust, scalable power system control, while outlining concrete avenues for scaling to larger networks and incorporating imitation learning and partial observability.
Abstract
Power grid operation is becoming more complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. However, the combinatorial nature of the action space poses a challenge to both conventional optimizers and learned controllers. Action space factorization, which breaks down decision-making into smaller sub-tasks, is one approach to tackle the curse of dimensionality. In this study, we propose a centrally coordinated multi-agent (CCMA) architecture for action space factorization. In this approach, regional agents propose actions and subsequently a coordinating agent selects the final action. We investigate several implementations of the CCMA architecture, and benchmark in different experimental settings against various L2RPN baseline approaches. The CCMA architecture exhibits higher sample efficiency and superior final performance than the baseline approaches. The results suggest high potential of the CCMA approach for further application in higher-dimensional L2RPN as well as real-world power grid settings.
