Table of Contents
Fetching ...

Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

Matteo Salvatori, Filippo Vannella, Sebastian Macaluso, Stylianos E. Trevlakis, Carlos Segura Perales, José Suarez-Varela, Alexandros-Apostolos A. Boulogeorgos, Ioannis Arapakis

Abstract

HandOver (HO) control in cellular networks is governed by a set of HO control parameters that are traditionally configured through rule-based heuristics. A key parameter for HO optimization is the Cell Individual Offset (CIO), defined for each pair of neighboring cells and used to bias HO triggering decisions. At network scale, tuning CIOs becomes a tightly coupled problem: small changes can redirect mobility flows across multiple neighbors, and static rules often degrade under non-stationary traffic and mobility. We exploit the pairwise structure of CIOs by formulating HO optimization as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) on the network's dual graph. In this representation, each agent controls a neighbor-pair CIO and observes Key Performance Indicators (KPIs) aggregated over its local dual-graph neighborhood, enabling scalable decentralized decisions while preserving graph locality. Building on this formulation, we propose TD3-D-MA, a discrete Multi-Agent Reinforcement Learning (MARL) variant of the TD3 algorithm with a shared-parameter Graph Neural Network (GNN) actor operating on the dual graph and region-wise double critics for training, improving credit assignment in dense deployments. We evaluate TD3-D-MA in an ns-3 system-level simulator configured with real-world network operator parameters across heterogeneous traffic regimes and network topologies. Results show that TD3-D-MA improves network throughput over standard HO heuristics and centralized RL baselines, and generalizes robustly under topology and traffic shifts.

Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

Abstract

HandOver (HO) control in cellular networks is governed by a set of HO control parameters that are traditionally configured through rule-based heuristics. A key parameter for HO optimization is the Cell Individual Offset (CIO), defined for each pair of neighboring cells and used to bias HO triggering decisions. At network scale, tuning CIOs becomes a tightly coupled problem: small changes can redirect mobility flows across multiple neighbors, and static rules often degrade under non-stationary traffic and mobility. We exploit the pairwise structure of CIOs by formulating HO optimization as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) on the network's dual graph. In this representation, each agent controls a neighbor-pair CIO and observes Key Performance Indicators (KPIs) aggregated over its local dual-graph neighborhood, enabling scalable decentralized decisions while preserving graph locality. Building on this formulation, we propose TD3-D-MA, a discrete Multi-Agent Reinforcement Learning (MARL) variant of the TD3 algorithm with a shared-parameter Graph Neural Network (GNN) actor operating on the dual graph and region-wise double critics for training, improving credit assignment in dense deployments. We evaluate TD3-D-MA in an ns-3 system-level simulator configured with real-world network operator parameters across heterogeneous traffic regimes and network topologies. Results show that TD3-D-MA improves network throughput over standard HO heuristics and centralized RL baselines, and generalizes robustly under topology and traffic shifts.

Paper Structure

This paper contains 21 sections, 13 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Dual-graph MARL framework for CIO-based HO. CIO agents are placed on dual-graph nodes corresponding to inter-cell edges $e=\{i,j\}$ and tune the HO bias $\mathrm{CIO}_{ij}$ for each neighbor pair. A distributed GNN actor performs local message passing to produce edge actions, while CTDE training uses region-wise critics defined on overlapping primal subnetworks (shaded colored areas).
  • Figure 2: Example of Network graph representations.
  • Figure 3: Simulator architecture.
  • Figure 4: Training (left) and testing (right) mobility scenarios on the 8-cell benchmark. Blue circles denote cells and blue links denote neighbor relations with tunable CIOs. Colored boxes indicate UE random-walk regions: red boxes span multi-cell center areas influenced by several neighboring cells, while green boxes concentrate mobility near a specific inter-cell border (handover hotspot).
  • Figure 5: Impact of the number of hops ($M$) on performance.
  • ...and 7 more figures