Table of Contents
Fetching ...

Decentralized Handover Parameter Optimization with MARL for Load Balancing in 5G Networks

Yang Shen, Shuqi Chai, Bing Li, Xiaodong Luo, Qingjiang Shi, Rongqing Zhang

TL;DR

This work addresses load balancing in dense 5G networks by classifying handovers into three types and solving a joint, NP-hard optimization via a decentralized MARL approach (MADEHO). It replaces global reward signals with a dynamic average consensus approximation to enable local communication among cells, and uses PPO with attention-enhanced networks to learn decentralized policies. The method demonstrates substantial gains in load balancing, throughput, and handover efficiency, with analytical bounds on the consensus approximation error. The results indicate practical benefits for scalable 5G deployments, where distributed coordination and engineering-aware handover modeling are essential.

Abstract

In cellular networks, cell handover refers to the process where a device switches from one base station to another, and this mechanism is crucial for balancing the load among different cells. Traditionally, engineers would manually adjust parameters based on experience. However, the explosive growth in the number of cells has rendered manual tuning impractical. Existing research tends to overlook critical engineering details in order to simplify handover problems. In this paper, we classify cell handover into three types, and jointly model their mutual influence. To achieve load balancing, we propose a multi-agent-reinforcement-learning (MARL)-based scheme to automatically optimize the parameters. To reduce the agent interaction costs, a distributed training is implemented based on consensus approximation of global average load, and it is shown that the approximation error is bounded. Experimental results show that our proposed scheme outperforms existing benchmarks in balancing load and improving network performance.

Decentralized Handover Parameter Optimization with MARL for Load Balancing in 5G Networks

TL;DR

This work addresses load balancing in dense 5G networks by classifying handovers into three types and solving a joint, NP-hard optimization via a decentralized MARL approach (MADEHO). It replaces global reward signals with a dynamic average consensus approximation to enable local communication among cells, and uses PPO with attention-enhanced networks to learn decentralized policies. The method demonstrates substantial gains in load balancing, throughput, and handover efficiency, with analytical bounds on the consensus approximation error. The results indicate practical benefits for scalable 5G deployments, where distributed coordination and engineering-aware handover modeling are essential.

Abstract

In cellular networks, cell handover refers to the process where a device switches from one base station to another, and this mechanism is crucial for balancing the load among different cells. Traditionally, engineers would manually adjust parameters based on experience. However, the explosive growth in the number of cells has rendered manual tuning impractical. Existing research tends to overlook critical engineering details in order to simplify handover problems. In this paper, we classify cell handover into three types, and jointly model their mutual influence. To achieve load balancing, we propose a multi-agent-reinforcement-learning (MARL)-based scheme to automatically optimize the parameters. To reduce the agent interaction costs, a distributed training is implemented based on consensus approximation of global average load, and it is shown that the approximation error is bounded. Experimental results show that our proposed scheme outperforms existing benchmarks in balancing load and improving network performance.

Paper Structure

This paper contains 17 sections, 1 theorem, 36 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption assumption, the dynamic average consensus algorithm ensures that the error between the estimate $\rho_{m,t}$ and the average load $\bar{L}_t$ is bounded by a constant, which can be specified as: where $\lambda$ is the spectral radius of the consensus matrix $\omega$ associated with the graph $G$. In addition, when $t \to \infty$, the error bound is lower, which is:

Figures (11)

  • Figure 1: An illustrated 5G cell handover scenario
  • Figure 2: Cells observe UE distribution and connection status to adapt handover parameters. They achieve decentralized training via local communication, using PPO to update policy and value networks for handling large action spaces.
  • Figure 3: The complete process of agent-environment interaction and policy optimization in the form of a flowchart.
  • Figure 4: The representative case shows the simulation setup and part of the optimized handover parameters with our proposed scheme.
  • Figure 5: Cell load std vs. average UE speed and UE distribution std.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof