Online Frequency Scheduling by Learning Parallel Actions

Anastasios Giovanidis; Mathieu Leconte; Sabrine Aroua; Tor Kvernvik; David Sandberg

Online Frequency Scheduling by Learning Parallel Actions

Anastasios Giovanidis, Mathieu Leconte, Sabrine Aroua, Tor Kvernvik, David Sandberg

TL;DR

This work tackles real-time frequency scheduling in multi-user MU-MIMO for 6G by formulating it as an MDP and solving it with a Deep Q-Network augmented with Action Branching to enable parallel per-sub-band decisions. To scale to many sub-bands, it introduces memory-efficient variants (Unibranch and Graph Neural Network) and applies Value Decomposition to coordinate branch actions, achieving near-top performance with substantially reduced parameters and inference time. The approach supports online adaptation via replay and fine-tuning, enabling the policy to bridge sim-to-real gaps in evolving environments. Empirical results show competitive PF-based scheduling performance against baselines (including alphaZero) with much faster inference, and strong online adaptation capabilities, highlighting the practical impact for real-time 6G RRM.

Abstract

Radio Resource Management is a challenging topic in future 6G networks where novel applications create strong competition among the users for the available resources. In this work we consider the frequency scheduling problem in a multi-user MIMO system. Frequency resources need to be assigned to a set of users while allowing for concurrent transmissions in the same sub-band. Traditional methods are insufficient to cope with all the involved constraints and uncertainties, whereas reinforcement learning can directly learn near-optimal solutions for such complex environments. However, the scheduling problem has an enormous action space accounting for all the combinations of users and sub-bands, so out-of-the-box algorithms cannot be used directly. In this work, we propose a scheduler based on action-branching over sub-bands, which is a deep Q-learning architecture with parallel decision capabilities. The sub-bands learn correlated but local decision policies and altogether they optimize a global reward. To improve the scaling of the architecture with the number of sub-bands, we propose variations (Unibranch, Graph Neural Network-based) that reduce the number of parameters to learn. The parallel decision making of the proposed architecture allows to meet short inference time requirements in real systems. Furthermore, the deep Q-learning approach permits online fine-tuning after deployment to bridge the sim-to-real gap. The proposed architectures are evaluated against relevant baselines from the literature showing competitive performance and possibilities of online adaptation to evolving environments.

Online Frequency Scheduling by Learning Parallel Actions

TL;DR

Abstract

Online Frequency Scheduling by Learning Parallel Actions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)