Online Frequency Scheduling by Learning Parallel Actions
Anastasios Giovanidis, Mathieu Leconte, Sabrine Aroua, Tor Kvernvik, David Sandberg
TL;DR
This work tackles real-time frequency scheduling in multi-user MU-MIMO for 6G by formulating it as an MDP and solving it with a Deep Q-Network augmented with Action Branching to enable parallel per-sub-band decisions. To scale to many sub-bands, it introduces memory-efficient variants (Unibranch and Graph Neural Network) and applies Value Decomposition to coordinate branch actions, achieving near-top performance with substantially reduced parameters and inference time. The approach supports online adaptation via replay and fine-tuning, enabling the policy to bridge sim-to-real gaps in evolving environments. Empirical results show competitive PF-based scheduling performance against baselines (including alphaZero) with much faster inference, and strong online adaptation capabilities, highlighting the practical impact for real-time 6G RRM.
Abstract
Radio Resource Management is a challenging topic in future 6G networks where novel applications create strong competition among the users for the available resources. In this work we consider the frequency scheduling problem in a multi-user MIMO system. Frequency resources need to be assigned to a set of users while allowing for concurrent transmissions in the same sub-band. Traditional methods are insufficient to cope with all the involved constraints and uncertainties, whereas reinforcement learning can directly learn near-optimal solutions for such complex environments. However, the scheduling problem has an enormous action space accounting for all the combinations of users and sub-bands, so out-of-the-box algorithms cannot be used directly. In this work, we propose a scheduler based on action-branching over sub-bands, which is a deep Q-learning architecture with parallel decision capabilities. The sub-bands learn correlated but local decision policies and altogether they optimize a global reward. To improve the scaling of the architecture with the number of sub-bands, we propose variations (Unibranch, Graph Neural Network-based) that reduce the number of parameters to learn. The parallel decision making of the proposed architecture allows to meet short inference time requirements in real systems. Furthermore, the deep Q-learning approach permits online fine-tuning after deployment to bridge the sim-to-real gap. The proposed architectures are evaluated against relevant baselines from the literature showing competitive performance and possibilities of online adaptation to evolving environments.
