Table of Contents
Fetching ...

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

Tianzhang Cai, Qichen Wang, Shuai Zhang, Özlem Tuğfe Demir, Cicek Cavdar

TL;DR

This work tackles energy efficiency in dense 5G networks by jointly managing advanced sleep modes and antenna switching across multiple cells. It formulates the problem as a decentralized partially observable Markov decision process (DEC-POMDP) and solves it with a MAPPO-based multi-agent reinforcement learning framework, introducing a scalable MAPPO-neighbor variant. A shared-reward, centralized-training, decentralized-execution (CTDE) scheme enables cooperative policies that reduce total base-station power while preserving QoS, with key results showing approximately $8.7\%$ power reduction at low traffic and $\sim 19\%$ improvement in energy efficiency at high traffic relative to Auto-SM1. The approach demonstrates scalability to larger networks and offers a practical path to green, adaptive multi-cell massive MIMO deployments, where the power model includes $P_c(K_c,M_c,s_c) = \delta_{s_c}\left(M_c P_{\mathrm{PA}}(p_a) + P_{\mathrm{BB}}(K_c,M_c) + P_o\right)$ and the user data rate follows $r_k = B \log_2\left(1+\text{SINR}_k\right)$.

Abstract

We develop a multi-agent reinforcement learning (MARL) algorithm to minimize the total energy consumption of multiple massive MIMO (multiple-input multiple-output) base stations (BSs) in a multi-cell network while preserving the overall quality-of-service (QoS) by making decisions on the multi-level advanced sleep modes (ASMs) and antenna switching of these BSs. The problem is modeled as a decentralized partially observable Markov decision process (DEC-POMDP) to enable collaboration between individual BSs, which is necessary to tackle inter-cell interference. A multi-agent proximal policy optimization (MAPPO) algorithm is designed to learn a collaborative BS control policy. To enhance its scalability, a modified version called MAPPO-neighbor policy is further proposed. Simulation results demonstrate that the trained MAPPO agent achieves better performance compared to baseline policies. Specifically, compared to the auto sleep mode 1 (symbol-level sleeping) algorithm, the MAPPO-neighbor policy reduces power consumption by approximately 8.7% during low-traffic hours and improves energy efficiency by approximately 19% during high-traffic hours, respectively.

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

TL;DR

This work tackles energy efficiency in dense 5G networks by jointly managing advanced sleep modes and antenna switching across multiple cells. It formulates the problem as a decentralized partially observable Markov decision process (DEC-POMDP) and solves it with a MAPPO-based multi-agent reinforcement learning framework, introducing a scalable MAPPO-neighbor variant. A shared-reward, centralized-training, decentralized-execution (CTDE) scheme enables cooperative policies that reduce total base-station power while preserving QoS, with key results showing approximately power reduction at low traffic and improvement in energy efficiency at high traffic relative to Auto-SM1. The approach demonstrates scalability to larger networks and offers a practical path to green, adaptive multi-cell massive MIMO deployments, where the power model includes and the user data rate follows .

Abstract

We develop a multi-agent reinforcement learning (MARL) algorithm to minimize the total energy consumption of multiple massive MIMO (multiple-input multiple-output) base stations (BSs) in a multi-cell network while preserving the overall quality-of-service (QoS) by making decisions on the multi-level advanced sleep modes (ASMs) and antenna switching of these BSs. The problem is modeled as a decentralized partially observable Markov decision process (DEC-POMDP) to enable collaboration between individual BSs, which is necessary to tackle inter-cell interference. A multi-agent proximal policy optimization (MAPPO) algorithm is designed to learn a collaborative BS control policy. To enhance its scalability, a modified version called MAPPO-neighbor policy is further proposed. Simulation results demonstrate that the trained MAPPO agent achieves better performance compared to baseline policies. Specifically, compared to the auto sleep mode 1 (symbol-level sleeping) algorithm, the MAPPO-neighbor policy reduces power consumption by approximately 8.7% during low-traffic hours and improves energy efficiency by approximately 19% during high-traffic hours, respectively.
Paper Structure (13 sections, 13 equations, 7 figures, 3 tables)

This paper contains 13 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Network structure of MAPPO agent.
  • Figure 2: Training drop ratio.
  • Figure 3: Training power consumption.
  • Figure 4: Comparison of sum data rate.
  • Figure 5: Comparison of power consumption.
  • ...and 2 more figures