Hierarchical Multi Agent DRL for Soft Handovers Between Edge Clouds in Open RAN
F. Giarrè, I. A. Meer, M. Masoudi, M. Ozger, C. Cavdar
TL;DR
The paper tackles the challenge of maintaining service continuity for transitional users during soft handovers between edge clouds in an Open RAN (O-RAN) setting with multi-connectivity. It introduces a hierarchical multi-agent reinforcement learning (HMARL) framework where a high-level agent enforces a common functional split for transitional users across edge clouds and low-level agents optimize non-transitional users locally, leveraging turn-based communication to mimic distributed operation. The problem is formulated as a MILP to maximize a weighted continuity metric $\omega_{nt}R(\mathcal{U}\setminus\mathcal{T})+\omega_t R(\mathcal{T})$ under GOPS and midhaul constraints $G^{e,t}_{tot}\le G_{th}$ and $M^{t}_{tot}\le M_{th}$, and solved approximately via PPO-based policies. Empirical results show that HMARL reduces constraint violations (GOPS ~ $13\%$, midhaul ~ $1\%$), increases non-transitional service continuity by about $5$–$10\%$ compared to static splits, and closely approaches the optimal policy, with good generalization to unseen resource scenarios, indicating practical potential for scalable O-RAN deployments.
Abstract
Multi-connectivity (MC) for aerial users via a set of ground access points offers the potential for highly reliable communication. Within an open radio access network (O-RAN) architecture, edge clouds (ECs) enable MC with low latency for users within their coverage area. However, ensuring seamless service continuity for transitional users-those moving between the coverage areas of neighboring ECs-poses challenges due to centralized processing demands. To address this, we formulate a problem facilitating soft handovers between ECs, ensuring seamless transitions while maintaining service continuity for all users. We propose a hierarchical multi-agent reinforcement learning (HMARL) algorithm to dynamically determine the optimal functional split configuration for transitional and non-transitional users. Simulation results show that the proposed approach outperforms the conventional functional split in terms of the percentage of users maintaining service continuity, with at most 4% optimality gap. Additionally, HMARL achieves better scalability compared to the static baselines.
