Multi-agent Deep Reinforcement Learning for Distributed Load Restoration
Linh Vu, Tuyen Vu, Thanh-Long Vu, Anurag Srivastava
TL;DR
This paper addresses load restoration after outages in distribution systems modeled as networked microgrids by proposing a multi-agent deep reinforcement learning framework with invalid action masking. A centralized-training, decentralized-execution architecture is used, where multiple DQN agents control individual microgrids and cooperate via a shared reward to maximize restored load under constraints expressed as $\sum_i \sum_j n_i P_{ij} w_{ij}$ with $V^{min}\le V_u\le V^{max}$, generator limits, and line-flow constraints. The key contributions include the first MARL approach to load restoration, the invalid action masking mechanism to ensure safety and manage large action spaces, and extensive validation on IEEE 13-, 123-, and 8500-node feeders showing faster learning and high restoration percentages (up to $98.6\%$, $96.04\%$, and $86.96\%$ of available generation, respectively). The results demonstrate improved learning stability and performance over single-agent baselines, while limitations such as topology changes requiring retraining and training-time growth with the number of agents are identified, with future work aimed at generalization and parallel training.
Abstract
This paper addresses the load restoration problem after power outage events. Our primary proposed methodology is using multi-agent deep reinforcement learning to optimize the load restoration process in distribution systems, modeled as networked microgrids, via determining the optimal operational sequence of circuit breakers (switches). An innovative invalid action masking technique is incorporated into the multi-agent method to handle both the physical constraints in the restoration process and the curse of dimensionality as the action space of operational decisions grows exponentially with the number of circuit breakers. The features of our proposed method include centralized training for multi-agents to overcome non-stationary environment problems, decentralized execution to ease the deployment, and zero constraint violations to prevent harmful actions. Our simulations are performed in OpenDSS and Python environments to demonstrate the effectiveness of the proposed approach using the IEEE 13, 123, and 8500-node distribution test feeders. The results show that the proposed algorithm can achieve a significantly better learning curve and stability than the conventional methods.
