Table of Contents
Fetching ...

Generative Resource Allocation for 6G O-RAN with Diffusion Policies

Salar Nouri, Mojdeh Karbalaeimotaleb, Vahid Shah-Mansouri, Tarik Taleb

TL;DR

This work tackles NP-hard dynamic resource allocation in 6G O-RAN for multi-service slices (eMBB, URLLC, mMTC) by introducing Diffusion-QL, a reinforcement learning framework that uses a conditional diffusion model as the policy. The agent generates near-optimal resource allocations by reversing a noising process, guided by the gradient of a learned Q-function to maximize long-term returns. System-level simulations show Diffusion-QL outperforms state-of-the-art DRL baselines (DQN, SS-VAE) and approaches the optimal ESA performance while offering robustness to inter-cell interference and scalability to larger networks, albeit with higher inference cost due to diffusion steps. The approach provides a scalable, data-efficient alternative to traditional RL in latency-constrained near-real-time RIC environments, enabling more reliable QoS delivery across diverse 6G network slices.

Abstract

Dynamic resource allocation in O-RAN is critical for managing the conflicting QoS requirements of 6G network slices. Conventional reinforcement learning agents often fail in this domain, as their unimodal policy structures cannot model the multi-modal nature of optimal allocation strategies. This paper introduces Diffusion Q-Learning (Diffusion-QL), a novel framework that represents the policy as a conditional diffusion model. Our approach generates resource allocation actions by iteratively reversing a noising process, with each step guided by the gradient of a learned Q-function. This method enables the policy to learn and sample from the complex distribution of near-optimal actions. Simulations demonstrate that the Diffusion-QL approach consistently outperforms state-of-the-art DRL baselines, offering a robust solution for the intricate resource management challenges in next-generation wireless networks.

Generative Resource Allocation for 6G O-RAN with Diffusion Policies

TL;DR

This work tackles NP-hard dynamic resource allocation in 6G O-RAN for multi-service slices (eMBB, URLLC, mMTC) by introducing Diffusion-QL, a reinforcement learning framework that uses a conditional diffusion model as the policy. The agent generates near-optimal resource allocations by reversing a noising process, guided by the gradient of a learned Q-function to maximize long-term returns. System-level simulations show Diffusion-QL outperforms state-of-the-art DRL baselines (DQN, SS-VAE) and approaches the optimal ESA performance while offering robustness to inter-cell interference and scalability to larger networks, albeit with higher inference cost due to diffusion steps. The approach provides a scalable, data-efficient alternative to traditional RL in latency-constrained near-real-time RIC environments, enabling more reliable QoS delivery across diverse 6G network slices.

Abstract

Dynamic resource allocation in O-RAN is critical for managing the conflicting QoS requirements of 6G network slices. Conventional reinforcement learning agents often fail in this domain, as their unimodal policy structures cannot model the multi-modal nature of optimal allocation strategies. This paper introduces Diffusion Q-Learning (Diffusion-QL), a novel framework that represents the policy as a conditional diffusion model. Our approach generates resource allocation actions by iteratively reversing a noising process, with each step guided by the gradient of a learned Q-function. This method enables the policy to learn and sample from the complex distribution of near-optimal actions. Simulations demonstrate that the Diffusion-QL approach consistently outperforms state-of-the-art DRL baselines, offering a robust solution for the intricate resource management challenges in next-generation wireless networks.

Paper Structure

This paper contains 14 sections, 4 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Architectural overview of the O-RAN systemnouri2024semi.
  • Figure 2: Overview of the Diffusion-QL training loop.
  • Figure 3: Performance evaluation of Diffusion-QL against benchmarks. (a) Reward convergence during training. (b) Aggregate throughput scalability with increasing UEs. (c) Throughput sensitivity to RU power. (d, e) Per-slice throughput analysis under varying power constraints. (f) Robustness to inter-cell interference