To RL or not to RL? An Algorithmic Cheat-Sheet for AI-Based Radio Resource Management

Lorenzo Maggi; Matthew Andrews; Ryo Koblitz

To RL or not to RL? An Algorithmic Cheat-Sheet for AI-Based Radio Resource Management

Lorenzo Maggi, Matthew Andrews, Ryo Koblitz

TL;DR

The paper reframes RRM problems as sequential decision processes and argues that RL is not always the right tool due to sample inefficiency. It provides a pragmatic decision-tree guiding whether to use static optimization, supervised learning, bandits, offline RL, MPC, or RL itself, based on horizon and model knowledge. By mapping common RRM use cases (e.g., SC, BF, ES, PC, LA, HO, AC) to appropriate techniques and detailing an elaborative beamforming example, it highlights where each method excels and where it may falter. The work proposes expanding the use of MPC and offline RL for long-horizon problems like SC and AC, while encouraging bandits and learning-based approaches for short-term tasks such as LA and BF, thereby offering a practical, theory-informed roadmap for RRM algorithm selection.

Abstract

Several Radio Resource Management (RRM) use cases can be framed as sequential decision planning problems, where an agent (the base station, typically) makes decisions that influence the network utility and state. While Reinforcement Learning (RL) in its general form can address this scenario, it is known to be sample inefficient. Following the principle of Occam's razor, we argue that the choice of the solution technique for RRM should be guided by questions such as, "Is it a short or long-term planning problem?", "Is the underlying model known or does it need to be learned?", "Can we solve the problem analytically?" or "Is an expert-designed policy available?". A wide range of techniques exists to address these questions, including static and stochastic optimization, bandits, model predictive control (MPC) and, indeed, RL. We review some of these techniques that have already been successfully applied to RRM, and we believe that others, such as MPC, may present exciting research opportunities for the future.

To RL or not to RL? An Algorithmic Cheat-Sheet for AI-Based Radio Resource Management

TL;DR

Abstract

Paper Structure (31 sections, 1 equation, 6 figures)

This paper contains 31 sections, 1 equation, 6 figures.

Introduction
The model: Markov Decision Process
MDP formulation for some RRM problems
Downlink user scheduling (SC)
Beamforming (BF)
Energy savings (ES)
Power control (PC)
Link adaptation (LA)
Handover (HO)
Admission control (AC)
Short versus long-term planning
Endogenous state evolution $\Rightarrow$ Long-term planning.
Exogenous state evolution $\Rightarrow$ Short-term planning
Short versus long-term planning in RRM use cases
SC, AC
...and 16 more sections

Figures (6)

Figure 1: Pictorial illustration of the MDP formulation for various RRM use cases.
Figure 2: Proposed algorithm selection cheatsheet in sequential decision problems for RRM.
Figure 3: Comparison of various techniques regarding their interaction with the live system.
Figure 4: Mapping between optimization techniques for sequential planning and RRM use cases.
Figure 5: User RSRP across beams and over time
...and 1 more figures

To RL or not to RL? An Algorithmic Cheat-Sheet for AI-Based Radio Resource Management

TL;DR

Abstract

To RL or not to RL? An Algorithmic Cheat-Sheet for AI-Based Radio Resource Management

Authors

TL;DR

Abstract

Table of Contents

Figures (6)