Optimizing Wireless Discontinuous Reception via MAC Signaling Learning
Adriano Pastore, Adrián Agustín de Dios, Álvaro Valcarce
TL;DR
This work introduces a reinforcement learning framework to optimize DRX in 5G NR networks by timing MAC CE signaling rather than tuning timers. By formulating DRX signaling as a protocol-learning problem and employing a DQN-based agent, the approach achieves substantial energy savings while preserving latency targets for XR-like traffic, leveraging both Rel-17 compliant and beyond signaling options. Key contributions include defining a rich per-UE and cell-wide state space, a reward that balances idle time and latency satisfaction, and comparative results showing near-halving of active time for single UEs and around 20% reductions for multiple UEs. The findings demonstrate the practical viability of automated, fine-grained DRX control via low-layer signaling, with potential implications for energy efficiency in future wireless networks and cross-UE optimization in more complex scheduling environments.
Abstract
We present a Reinforcement Learning (RL) approach to the problem of controlling the Discontinuous Reception (DRX) policy from a Base Transceiver Station (BTS) in a cellular network. We do so by means of optimally timing the transmission of fast Layer-2 signaling messages (a.k.a. Medium Access Layer (MAC) Control Elements (CEs) as specified in 5G New Radio). Unlike more conventional approaches to DRX optimization, which rely on fine-tuning the values of DRX timers, we assess the gains that can be obtained solely by means of this MAC CE signalling. For the simulation part, we concentrate on traffic types typically encountered in Extended Reality (XR) applications, where the need for battery drain minimization and overheating mitigation are particularly pressing. Both 3GPP 5G New Radio (5G NR) compliant and non-compliant ("beyond 5G") MAC CEs are considered. Our simulation results show that our proposed technique strikes an improved trade-off between latency and energy savings as compared to conventional timer-based approaches that are characteristic of most current implementations. Specifically, our RL-based policy can nearly halve the active time for a single User Equipment (UE) with respect to a naïve MAC CE transmission policy, and still achieve near 20% active time reduction for 9 simultaneously served UEs.
