Generative Resource Allocation for 6G O-RAN with Diffusion Policies
Salar Nouri, Mojdeh Karbalaeimotaleb, Vahid Shah-Mansouri, Tarik Taleb
TL;DR
This work tackles NP-hard dynamic resource allocation in 6G O-RAN for multi-service slices (eMBB, URLLC, mMTC) by introducing Diffusion-QL, a reinforcement learning framework that uses a conditional diffusion model as the policy. The agent generates near-optimal resource allocations by reversing a noising process, guided by the gradient of a learned Q-function to maximize long-term returns. System-level simulations show Diffusion-QL outperforms state-of-the-art DRL baselines (DQN, SS-VAE) and approaches the optimal ESA performance while offering robustness to inter-cell interference and scalability to larger networks, albeit with higher inference cost due to diffusion steps. The approach provides a scalable, data-efficient alternative to traditional RL in latency-constrained near-real-time RIC environments, enabling more reliable QoS delivery across diverse 6G network slices.
Abstract
Dynamic resource allocation in O-RAN is critical for managing the conflicting QoS requirements of 6G network slices. Conventional reinforcement learning agents often fail in this domain, as their unimodal policy structures cannot model the multi-modal nature of optimal allocation strategies. This paper introduces Diffusion Q-Learning (Diffusion-QL), a novel framework that represents the policy as a conditional diffusion model. Our approach generates resource allocation actions by iteratively reversing a noising process, with each step guided by the gradient of a learned Q-function. This method enables the policy to learn and sample from the complex distribution of near-optimal actions. Simulations demonstrate that the Diffusion-QL approach consistently outperforms state-of-the-art DRL baselines, offering a robust solution for the intricate resource management challenges in next-generation wireless networks.
