Table of Contents
Fetching ...

Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning

Saad Masrur, Ismail Guvenc, David Lopez Perez

TL;DR

The paper addresses energy efficiency in dense mmWave 5G networks by formulating sleep-mode optimization as a multi-agent deep reinforcement learning problem. It introduces MARL-DDQN, where each BS operates as an independent DDQN agent within a CTDE framework, utilizing UE-clustered state representations and a load- and beamforming-aware power model in a 3D urban environment. The approach achieves superior energy efficiency, throughput fairness, and QoS satisfaction compared with state-of-the-art baselines, and demonstrates scalability across BS and UE densities, mobility patterns, and QoS targets. The results highlight the practical potential of distributed MARL for energy-aware network management in next-generation mmWave deployments, with future work aimed at joint SMO and beamforming optimization and real-world validation via O-RAN XApp frameworks.

Abstract

Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, existing optimization and reinforcement learning (RL) approaches rely on aggregated, static base station (BS) traffic models that fail to capture non-stationary traffic dynamics and suffer from large state-action spaces, limiting real-world deployment. To address these challenges, this paper proposes a multi-agent deep reinforcement learning (MARL) framework using a Double Deep Q-Network (DDQN), referred to as MARL-DDQN, for adaptive SMO in a 3D urban environment with a time-varying and community-based user equipment (UE) mobility model. Unlike conventional single-agent RL, MARL-DDQN enables scalable, distributed decision-making with minimal signaling overhead. A realistic BS power consumption model and beamforming are integrated to accurately quantify EE, while QoS is defined in terms of throughput. The method adapts SMO policies to maximize EE while mitigating inter-cell interference and ensuring throughput fairness. Simulations show that MARL-DDQN outperforms state-of-the-art strategies, including All On, iterative QoS-aware load-based (IT-QoS-LB), MARL-DDPG, and MARL-PPO, achieving up to 0.60 Mbit/Joule EE, 8.5 Mbps 10th-percentile throughput, and meeting QoS constraints 95% of the time under dynamic scenarios.

Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning

TL;DR

The paper addresses energy efficiency in dense mmWave 5G networks by formulating sleep-mode optimization as a multi-agent deep reinforcement learning problem. It introduces MARL-DDQN, where each BS operates as an independent DDQN agent within a CTDE framework, utilizing UE-clustered state representations and a load- and beamforming-aware power model in a 3D urban environment. The approach achieves superior energy efficiency, throughput fairness, and QoS satisfaction compared with state-of-the-art baselines, and demonstrates scalability across BS and UE densities, mobility patterns, and QoS targets. The results highlight the practical potential of distributed MARL for energy-aware network management in next-generation mmWave deployments, with future work aimed at joint SMO and beamforming optimization and real-world validation via O-RAN XApp frameworks.

Abstract

Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, existing optimization and reinforcement learning (RL) approaches rely on aggregated, static base station (BS) traffic models that fail to capture non-stationary traffic dynamics and suffer from large state-action spaces, limiting real-world deployment. To address these challenges, this paper proposes a multi-agent deep reinforcement learning (MARL) framework using a Double Deep Q-Network (DDQN), referred to as MARL-DDQN, for adaptive SMO in a 3D urban environment with a time-varying and community-based user equipment (UE) mobility model. Unlike conventional single-agent RL, MARL-DDQN enables scalable, distributed decision-making with minimal signaling overhead. A realistic BS power consumption model and beamforming are integrated to accurately quantify EE, while QoS is defined in terms of throughput. The method adapts SMO policies to maximize EE while mitigating inter-cell interference and ensuring throughput fairness. Simulations show that MARL-DDQN outperforms state-of-the-art strategies, including All On, iterative QoS-aware load-based (IT-QoS-LB), MARL-DDPG, and MARL-PPO, achieving up to 0.60 Mbit/Joule EE, 8.5 Mbps 10th-percentile throughput, and meeting QoS constraints 95% of the time under dynamic scenarios.

Paper Structure

This paper contains 38 sections, 32 equations, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: Urban Macro (UMa) outdoor-to-outdoor communication scenario, where the reduced BS positions \ref{['RBS']} are represented by red triangles.
  • Figure 2: Illustration of the NMP and CMP areas. The SA is outlined in red, while buildings are represented in blue. A total of $C = 4$ communities are depicted. The CMP area is highlighted in yellow, whereas the NMP area encompasses both the yellow and magenta regions.
  • Figure 3: Two-state Markov model governing UE transitions between the roaming and local epochs.
  • Figure 4: Illustration of the proposed MARL framework for SMO. Each BS is controlled by an independent agent that interacts with the environment and learns a policy for BS SM.
  • Figure 5: Comparison of MARL-DDQN, IT-QoS-LB, and All On strategies over training episodes for $N=9$ and $U=70$ in terms of NA EE, $\Delta_{10}$, total throughput, and the QoS satisfaction percentage.
  • ...and 4 more figures