Table of Contents
Fetching ...

Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems

Junkai Hu, Li Xia

TL;DR

This work tackles long-term economic and reliable energy management in multi-microgrid systems with distributed EMS by formulating the problem as a mean-variance team stochastic game. It introduces a fully distributed independent policy gradient algorithm (MV-IPGA) with global convergence guarantees, and extends to a data-driven DRL method (MV-IPPO) for unknown dynamics and larger-scale MMSs. The methods leverage mean-variance objectives to balance revenue and reliability, validated through scenario-based experiments that demonstrate effective coordination among microgrids without centralized control. The presented approach enables scalable, distributed optimization of exchange power under uncertainty, offering practical impact for robust MMS operation amid renewable variability.

Abstract

Efficiency and reliability are both crucial for energy management, especially in multi-microgrid systems (MMSs) integrating intermittent and distributed renewable energy sources. This study investigates an economic and reliable energy management problem in MMSs under a distributed scheme, where each microgrid independently updates its energy management policy in a decentralized manner to optimize the long-term system performance collaboratively. We introduce the mean and variance of the exchange power between the MMS and the main grid as indicators for the economic performance and reliability of the system. Accordingly, we formulate the energy management problem as a mean-variance team stochastic game (MV-TSG), where conventional methods based on the maximization of expected cumulative rewards are unsuitable for variance metrics. To solve MV-TSGs, we propose a fully distributed independent policy gradient algorithm, with rigorous convergence analysis, for scenarios with known model parameters. For large-scale scenarios with unknown model parameters, we further develop a deep reinforcement learning algorithm based on independent policy gradients, enabling data-driven policy optimization. Numerical experiments in two scenarios validate the effectiveness of the proposed methods. Our approaches fully leverage the distributed computational capabilities of MMSs and achieve a well-balanced trade-off between economic performance and operational reliability.

Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems

TL;DR

This work tackles long-term economic and reliable energy management in multi-microgrid systems with distributed EMS by formulating the problem as a mean-variance team stochastic game. It introduces a fully distributed independent policy gradient algorithm (MV-IPGA) with global convergence guarantees, and extends to a data-driven DRL method (MV-IPPO) for unknown dynamics and larger-scale MMSs. The methods leverage mean-variance objectives to balance revenue and reliability, validated through scenario-based experiments that demonstrate effective coordination among microgrids without centralized control. The presented approach enables scalable, distributed optimization of exchange power under uncertainty, offering practical impact for robust MMS operation amid renewable variability.

Abstract

Efficiency and reliability are both crucial for energy management, especially in multi-microgrid systems (MMSs) integrating intermittent and distributed renewable energy sources. This study investigates an economic and reliable energy management problem in MMSs under a distributed scheme, where each microgrid independently updates its energy management policy in a decentralized manner to optimize the long-term system performance collaboratively. We introduce the mean and variance of the exchange power between the MMS and the main grid as indicators for the economic performance and reliability of the system. Accordingly, we formulate the energy management problem as a mean-variance team stochastic game (MV-TSG), where conventional methods based on the maximization of expected cumulative rewards are unsuitable for variance metrics. To solve MV-TSGs, we propose a fully distributed independent policy gradient algorithm, with rigorous convergence analysis, for scenarios with known model parameters. For large-scale scenarios with unknown model parameters, we further develop a deep reinforcement learning algorithm based on independent policy gradients, enabling data-driven policy optimization. Numerical experiments in two scenarios validate the effectiveness of the proposed methods. Our approaches fully leverage the distributed computational capabilities of MMSs and achieve a well-balanced trade-off between economic performance and operational reliability.

Paper Structure

This paper contains 22 sections, 8 theorems, 51 equations, 7 figures, 7 tables, 2 algorithms.

Key Result

Lemma 1

For any two joint policies $\bm\mu, \bm\mu' \in \mathcal{U}$, the mean-variance performance difference is

Figures (7)

  • Figure 1: The illustration of EMS topologies in MMSs (MG denotes the microgrid).
  • Figure 2: Grid-connected MMS under the distributed EMS scheme
  • Figure 3: Convergence procedure of MV-PG with different coefficients.
  • Figure 4: An episode of exchange power over 72 time steps under the MV-IPGA policy.
  • Figure 5: Trajectory details of states and actions when $\beta=1.0$.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Lemma 1: Performance difference in MV-TSGs
  • Lemma 2: Mean-variance policy gradient
  • Lemma 3: Performance partial derivative
  • Lemma 4
  • Lemma 5
  • Theorem 1
  • Lemma 6: bubeck2015convex, Lemma 3.6
  • Lemma 7: agarwal2021theory, Proposition 37