Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation
Samuele Peri, Alessio Russo, Gabor Fodor, Pablo Soldati
TL;DR
The paper addresses the challenge of downlink LA in 5G/6G RANs by proposing offline RL as a non-invasive alternative to live-network training. It introduces three LA designs—BCQ, CQL, and a DT-based approach—trained on static transition datasets gathered with a DQN behavioral policy, and shows that offline RL can match online RL performance when data quality and coverage are appropriate. Across simulations, BCQ and CQL typically outperform OLLA and approach the performance of online DQN, while DT offers long-horizon sequence modeling with careful RTG conditioning and temporal embeddings. The work demonstrates the practical viability of offline RL for RAN control, and discusses DT design considerations, data collection strategies, and avenues for improving generalization to large-scale deployments and noninvasive data collection.
Abstract
Link adaptation (LA) is an essential function in modern wireless communication systems that dynamically adjusts the transmission rate of a communication link to match time- and frequency-varying radio link conditions. However, factors such as user mobility, fast fading, imperfect channel quality information, and aging of measurements make the modeling of LA challenging. To bypass the need for explicit modeling, recent research has introduced online reinforcement learning (RL) approaches as an alternative to the more commonly used rule-based algorithms. Yet, RL-based approaches face deployment challenges, as training in live networks can potentially degrade real-time performance. To address this challenge, this paper considers offline RL as a candidate to learn LA policies with minimal effects on the network operation. We propose three LA designs based on batch-constrained deep Q-learning, conservative Q-learning, and decision transformer. Our results show that offline RL algorithms can match the performance of state-of-the-art online RL methods when data is collected with a proper behavioral policy.
