Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

Tianzhang Cai; Qichen Wang; Shuai Zhang; Özlem Tuğfe Demir; Cicek Cavdar

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

Tianzhang Cai, Qichen Wang, Shuai Zhang, Özlem Tuğfe Demir, Cicek Cavdar

TL;DR

This work tackles energy efficiency in dense 5G networks by jointly managing advanced sleep modes and antenna switching across multiple cells. It formulates the problem as a decentralized partially observable Markov decision process (DEC-POMDP) and solves it with a MAPPO-based multi-agent reinforcement learning framework, introducing a scalable MAPPO-neighbor variant. A shared-reward, centralized-training, decentralized-execution (CTDE) scheme enables cooperative policies that reduce total base-station power while preserving QoS, with key results showing approximately $8.7\%$ power reduction at low traffic and $\sim 19\%$ improvement in energy efficiency at high traffic relative to Auto-SM1. The approach demonstrates scalability to larger networks and offers a practical path to green, adaptive multi-cell massive MIMO deployments, where the power model includes $P_c(K_c,M_c,s_c) = \delta_{s_c}\left(M_c P_{\mathrm{PA}}(p_a) + P_{\mathrm{BB}}(K_c,M_c) + P_o\right)$ and the user data rate follows $r_k = B \log_2\left(1+\text{SINR}_k\right)$.

Abstract

We develop a multi-agent reinforcement learning (MARL) algorithm to minimize the total energy consumption of multiple massive MIMO (multiple-input multiple-output) base stations (BSs) in a multi-cell network while preserving the overall quality-of-service (QoS) by making decisions on the multi-level advanced sleep modes (ASMs) and antenna switching of these BSs. The problem is modeled as a decentralized partially observable Markov decision process (DEC-POMDP) to enable collaboration between individual BSs, which is necessary to tackle inter-cell interference. A multi-agent proximal policy optimization (MAPPO) algorithm is designed to learn a collaborative BS control policy. To enhance its scalability, a modified version called MAPPO-neighbor policy is further proposed. Simulation results demonstrate that the trained MAPPO agent achieves better performance compared to baseline policies. Specifically, compared to the auto sleep mode 1 (symbol-level sleeping) algorithm, the MAPPO-neighbor policy reduces power consumption by approximately 8.7% during low-traffic hours and improves energy efficiency by approximately 19% during high-traffic hours, respectively.

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

TL;DR

power reduction at low traffic and

improvement in energy efficiency at high traffic relative to Auto-SM1. The approach demonstrates scalability to larger networks and offers a practical path to green, adaptive multi-cell massive MIMO deployments, where the power model includes

and the user data rate follows

Abstract

Paper Structure (13 sections, 13 equations, 7 figures, 3 tables)

This paper contains 13 sections, 13 equations, 7 figures, 3 tables.

Introduction
System Model
Traffic Model
Advanced Sleep Modes
PC Modeling of Massive MIMO BS with ASM
Intra-Cell Power Allocation
MAPPO-based Multi-cell ASM and Antenna Switching Algorithm
Action Space and State Space
DEC-POMDP
Reward Design
Learning in DEC-POMDP
Results and Analysis
Conclusions

Figures (7)

Figure 1: Network structure of MAPPO agent.
Figure 2: Training drop ratio.
Figure 3: Training power consumption.
Figure 4: Comparison of sum data rate.
Figure 5: Comparison of power consumption.
...and 2 more figures

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

TL;DR

Abstract

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (7)