Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning
Saad Masrur, Ismail Guvenc, David Lopez Perez
TL;DR
The paper addresses energy efficiency in dense mmWave 5G networks by formulating sleep-mode optimization as a multi-agent deep reinforcement learning problem. It introduces MARL-DDQN, where each BS operates as an independent DDQN agent within a CTDE framework, utilizing UE-clustered state representations and a load- and beamforming-aware power model in a 3D urban environment. The approach achieves superior energy efficiency, throughput fairness, and QoS satisfaction compared with state-of-the-art baselines, and demonstrates scalability across BS and UE densities, mobility patterns, and QoS targets. The results highlight the practical potential of distributed MARL for energy-aware network management in next-generation mmWave deployments, with future work aimed at joint SMO and beamforming optimization and real-world validation via O-RAN XApp frameworks.
Abstract
Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, existing optimization and reinforcement learning (RL) approaches rely on aggregated, static base station (BS) traffic models that fail to capture non-stationary traffic dynamics and suffer from large state-action spaces, limiting real-world deployment. To address these challenges, this paper proposes a multi-agent deep reinforcement learning (MARL) framework using a Double Deep Q-Network (DDQN), referred to as MARL-DDQN, for adaptive SMO in a 3D urban environment with a time-varying and community-based user equipment (UE) mobility model. Unlike conventional single-agent RL, MARL-DDQN enables scalable, distributed decision-making with minimal signaling overhead. A realistic BS power consumption model and beamforming are integrated to accurately quantify EE, while QoS is defined in terms of throughput. The method adapts SMO policies to maximize EE while mitigating inter-cell interference and ensuring throughput fairness. Simulations show that MARL-DDQN outperforms state-of-the-art strategies, including All On, iterative QoS-aware load-based (IT-QoS-LB), MARL-DDPG, and MARL-PPO, achieving up to 0.60 Mbit/Joule EE, 8.5 Mbps 10th-percentile throughput, and meeting QoS constraints 95% of the time under dynamic scenarios.
