Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems
Junkai Hu, Li Xia
TL;DR
This work tackles long-term economic and reliable energy management in multi-microgrid systems with distributed EMS by formulating the problem as a mean-variance team stochastic game. It introduces a fully distributed independent policy gradient algorithm (MV-IPGA) with global convergence guarantees, and extends to a data-driven DRL method (MV-IPPO) for unknown dynamics and larger-scale MMSs. The methods leverage mean-variance objectives to balance revenue and reliability, validated through scenario-based experiments that demonstrate effective coordination among microgrids without centralized control. The presented approach enables scalable, distributed optimization of exchange power under uncertainty, offering practical impact for robust MMS operation amid renewable variability.
Abstract
Efficiency and reliability are both crucial for energy management, especially in multi-microgrid systems (MMSs) integrating intermittent and distributed renewable energy sources. This study investigates an economic and reliable energy management problem in MMSs under a distributed scheme, where each microgrid independently updates its energy management policy in a decentralized manner to optimize the long-term system performance collaboratively. We introduce the mean and variance of the exchange power between the MMS and the main grid as indicators for the economic performance and reliability of the system. Accordingly, we formulate the energy management problem as a mean-variance team stochastic game (MV-TSG), where conventional methods based on the maximization of expected cumulative rewards are unsuitable for variance metrics. To solve MV-TSGs, we propose a fully distributed independent policy gradient algorithm, with rigorous convergence analysis, for scenarios with known model parameters. For large-scale scenarios with unknown model parameters, we further develop a deep reinforcement learning algorithm based on independent policy gradients, enabling data-driven policy optimization. Numerical experiments in two scenarios validate the effectiveness of the proposed methods. Our approaches fully leverage the distributed computational capabilities of MMSs and achieve a well-balanced trade-off between economic performance and operational reliability.
