Table of Contents
Fetching ...

SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G

Hossein Mohammadi, Seyed Bagher Hashemi Natanzi, Ramak Nassiri, Jamshid Hassanpour, Bo Tang, Vuk Marojevic

Abstract

Dynamic spectrum slicing is a critical enabler for 6G Radio Access Networks (RANs), allowing the coexistence of heterogeneous services. However, optimizing resource allocation in dense, interference-limited deployments remains challenging due to non-stationary channel dynamics, strict Quality-of-Service (QoS) requirements, and the need for data privacy. In this paper, we propose SliceFed, a novel Federated Constrained Multi-Agent Deep Reinforcement Learning (F-MADRL) framework. SliceFed formulates the slicing problem as a Constrained Markov Decision Process (CMDP) where autonomous gNB agents maximize spectral efficiency while explicitly satisfying inter-cell interference budgets and hard ultra-reliable low-latency communication (URLLC) latency deadlines. We employ a Lagrangian primal-dual approach integrated with Proximal Policy Optimization (PPO) to enforce constraints, while Federated Averaging enables collaborative learning without exchanging raw local data. Extensive simulations in a dense multi-cell environment demonstrate that SliceFed converges to a stable, safety-aware policy. Unlike heuristic and unconstrained baselines, SliceFed achieves nearly 100% satisfaction of 1~ms URLLC latency deadlines and exhibits superior robustness to traffic load variations, verifying its potential for reliable and scalable 6G spectrum management.

SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G

Abstract

Dynamic spectrum slicing is a critical enabler for 6G Radio Access Networks (RANs), allowing the coexistence of heterogeneous services. However, optimizing resource allocation in dense, interference-limited deployments remains challenging due to non-stationary channel dynamics, strict Quality-of-Service (QoS) requirements, and the need for data privacy. In this paper, we propose SliceFed, a novel Federated Constrained Multi-Agent Deep Reinforcement Learning (F-MADRL) framework. SliceFed formulates the slicing problem as a Constrained Markov Decision Process (CMDP) where autonomous gNB agents maximize spectral efficiency while explicitly satisfying inter-cell interference budgets and hard ultra-reliable low-latency communication (URLLC) latency deadlines. We employ a Lagrangian primal-dual approach integrated with Proximal Policy Optimization (PPO) to enforce constraints, while Federated Averaging enables collaborative learning without exchanging raw local data. Extensive simulations in a dense multi-cell environment demonstrate that SliceFed converges to a stable, safety-aware policy. Unlike heuristic and unconstrained baselines, SliceFed achieves nearly 100% satisfaction of 1~ms URLLC latency deadlines and exhibits superior robustness to traffic load variations, verifying its potential for reliable and scalable 6G spectrum management.
Paper Structure (23 sections, 27 equations, 4 figures, 3 algorithms)

This paper contains 23 sections, 27 equations, 4 figures, 3 algorithms.

Figures (4)

  • Figure 1: Training dynamics of SliceFed over federated communication rounds: Normalized average reward compared to baselines(a) and evolution of constraint violations for Interference ($g_1$) and URLLC Latency ($g_2$) (b).
  • Figure 2: Empirical CDF of URLLC packet delays.
  • Figure 3: Temporal evolution of queues (top) and resource allocation (bottom) for SliceFed (left) and QueueProp (right).
  • Figure 4: Robustness analysis under increasing URLLC traffic load $\lambda$. Left: average normalized reward (mean $\pm$ std). Right: mean URLLC latency constraint value $g_2$ (mean $\pm$ std). SliceFed maintains $g_2 \approx 0$ across all traffic loads, indicating strict satisfaction of the URLLC latency constraint. The apparent negative values visible for some baselines arise from statistical averaging and standard deviation visualization around zero on a symmetric log scale and do not correspond to negative constraint violations.