To RL or not to RL? An Algorithmic Cheat-Sheet for AI-Based Radio Resource Management
Lorenzo Maggi, Matthew Andrews, Ryo Koblitz
TL;DR
The paper reframes RRM problems as sequential decision processes and argues that RL is not always the right tool due to sample inefficiency. It provides a pragmatic decision-tree guiding whether to use static optimization, supervised learning, bandits, offline RL, MPC, or RL itself, based on horizon and model knowledge. By mapping common RRM use cases (e.g., SC, BF, ES, PC, LA, HO, AC) to appropriate techniques and detailing an elaborative beamforming example, it highlights where each method excels and where it may falter. The work proposes expanding the use of MPC and offline RL for long-horizon problems like SC and AC, while encouraging bandits and learning-based approaches for short-term tasks such as LA and BF, thereby offering a practical, theory-informed roadmap for RRM algorithm selection.
Abstract
Several Radio Resource Management (RRM) use cases can be framed as sequential decision planning problems, where an agent (the base station, typically) makes decisions that influence the network utility and state. While Reinforcement Learning (RL) in its general form can address this scenario, it is known to be sample inefficient. Following the principle of Occam's razor, we argue that the choice of the solution technique for RRM should be guided by questions such as, "Is it a short or long-term planning problem?", "Is the underlying model known or does it need to be learned?", "Can we solve the problem analytically?" or "Is an expert-designed policy available?". A wide range of techniques exists to address these questions, including static and stochastic optimization, bandits, model predictive control (MPC) and, indeed, RL. We review some of these techniques that have already been successfully applied to RRM, and we believe that others, such as MPC, may present exciting research opportunities for the future.
