Table of Contents
Fetching ...

Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks

Alireza Alizadeh, Byungju Lim, Mai Vu

TL;DR

This work proposes two multi-agent action selection policies for performing real-time load balancing user association and handover in dense cellular networks, and integrates these policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility.

Abstract

As next generation cellular networks become denser, associating users with the optimal base stations at each time while ensuring no base station is overloaded becomes critical for achieving stable and high network performance. We propose multi-agent online Q-learning (QL) algorithms for performing real-time load balancing user association and handover in dense cellular networks. The load balancing constraints at all base stations couple the actions of user agents, and we propose two multi-agent action selection policies, one centralized and one distributed, to satisfy load balancing at every learning step. In the centralized policy, the actions of UEs are determined by a central load balancer (CLB) running an algorithm based on swapping the worst connection to maximize the total learning reward. In the distributed policy, each UE takes an action based on its local information by participating in a distributed matching game with the BSs to maximize the local reward. We then integrate these action selection policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility, using a reward function that considers a handover cost to reduce handover frequency. The proposed multi-agent QL algorithm features low-complexity and fast convergence, outperforming 3GPP max-SINR association. Both policies adapt well to network dynamics at various UE speed profiles from walking, running, to biking and suburban driving, illustrating their robustness and real-time adaptability.

Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks

TL;DR

This work proposes two multi-agent action selection policies for performing real-time load balancing user association and handover in dense cellular networks, and integrates these policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility.

Abstract

As next generation cellular networks become denser, associating users with the optimal base stations at each time while ensuring no base station is overloaded becomes critical for achieving stable and high network performance. We propose multi-agent online Q-learning (QL) algorithms for performing real-time load balancing user association and handover in dense cellular networks. The load balancing constraints at all base stations couple the actions of user agents, and we propose two multi-agent action selection policies, one centralized and one distributed, to satisfy load balancing at every learning step. In the centralized policy, the actions of UEs are determined by a central load balancer (CLB) running an algorithm based on swapping the worst connection to maximize the total learning reward. In the distributed policy, each UE takes an action based on its local information by participating in a distributed matching game with the BSs to maximize the local reward. We then integrate these action selection policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility, using a reward function that considers a handover cost to reduce handover frequency. The proposed multi-agent QL algorithm features low-complexity and fast convergence, outperforming 3GPP max-SINR association. Both policies adapt well to network dynamics at various UE speed profiles from walking, running, to biking and suburban driving, illustrating their robustness and real-time adaptability.
Paper Structure (36 sections, 14 equations, 11 figures, 2 tables, 3 algorithms)

This paper contains 36 sections, 14 equations, 11 figures, 2 tables, 3 algorithms.

Figures (11)

  • Figure 1: Illustration of a two-tier cellular HetNet. Each user is associated with one of $J_M$ MBSs and $J_s$ SBSs while requesting $N_k^\text{mmW}$ data streams from a MBS or $N_k^{\mu \text{W}}$ data streams from a SBS.
  • Figure 2: Structure of moving step $n$ during which UE $k$ travels from source waypoint $\mathbf{X}_{k,n-1}$ to target waypoint $\mathbf{X}_{k,n}$ with velocity $V_{k,n}$. The number of MBs for each moving step is obtained according to (\ref{['N_MB']}), which depends on UE velocity $V_{k,n}$, distance between source and target waypoints $L_{k,n}$, and time duration of each MB $t^\text{MB}$.
  • Figure 3: Procedure for Q-value update at the CLB in the proposed centralized action selection policy
  • Figure 4: Procedure for U-value update in the proposed distributed action selection policy without CLB.
  • Figure 5: Timing diagram for one measurement block (MB) $b$, showing the behavioral (learning) update and the target (association) update. Each learning step $t$ within MB $b$ runs one inner iteration of the load balancing policy and the updating and reporting process, as described in Alg. \ref{['Main_Alg']}. After $T$ learning steps, each UE performs handover if $\beta^{(b)}_k\neq \beta^{(b-1)}_k$, followed by data communication between each UE and its associated BS.
  • ...and 6 more figures