Offline and Distributional Reinforcement Learning for Radio Resource Management
Eslam Eldeeb, Hirley Alves
TL;DR
This work tackles the practical challenges of applying reinforcement learning to radio resource management by adopting offline and distributional RL. The authors formulate RRM as an MDP with a PF-based objective and develop Conservative Quantile Regression (CQR), which combines Conservative Q-Learning and QR-DQN to learn from static data while modeling return distributions. The results show that CQR outperforms baseline schemes and even online RL in the RRM context, demonstrating improved sum-rate and 5th-percentile fairness and highlighting data-efficiency with smaller offline datasets. The approach offers a safer, more robust path to intelligent wireless control for 6G-era networks and can be extended to other optimization tasks such as beamforming and IRS-assisted communications.
Abstract
Reinforcement learning (RL) has proved to have a promising role in future intelligent wireless networks. Online RL has been adopted for radio resource management (RRM), taking over traditional schemes. However, due to its reliance on online interaction with the environment, its role becomes limited in practical, real-world problems where online interaction is not feasible. In addition, traditional RL stands short in front of the uncertainties and risks in real-world stochastic environments. In this manner, we propose an offline and distributional RL scheme for the RRM problem, enabling offline training using a static dataset without any interaction with the environment and considering the sources of uncertainties using the distributions of the return. Simulation results demonstrate that the proposed scheme outperforms conventional resource management models. In addition, it is the only scheme that surpasses online RL with a 10 % gain over online RL.
