Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach
Miguel Casasnovas, Francesc Wilhelmi, Richard Combes, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Anders Jonsson, Luis Esteve, Boris Bellalta
TL;DR
This work addresses the rigidity of IEEE 802.11 channel access in dense OBSS scenarios by casting Wi-Fi MAC optimization as online (contextual) multi-armed bandit problems. It introduces single-agent and cooperative multi-agent architectures to learn the primary channel, channel width, and CW, with contextual variants (LinUCB) showing the strongest performance and fastest convergence. The results demonstrate that contextual MABs yield superior adaptability to dynamic traffic and OBSS conditions, while multi-agent setups can implicitly coordinate but may raise fairness concerns due to competitive dynamics. Overall, the approach provides a practical, lightweight alternative to static channel configurations, enabling more efficient spectrum use and agile adaptation in real-world WLAN deployments.
Abstract
Due to its static protocol design, IEEE 802.11 (aka Wi-Fi) channel access lacks adaptability to address dynamic network conditions, resulting in inefficient spectrum utilization, unnecessary contention, and packet collisions. This paper investigates reinforcement learning (RL) solutions to optimize Wi-Fi's medium access control (MAC). In particular, a multi-armed bandit (MAB) framework is proposed for dynamic channel access (including both the primary channel and channel width) and contention window (CW) adjustment. In this setting, we study relevant learning design principles such as adopting joint or factorial action spaces (handled by a single agent (SA) and multiple agents (MA), respectively) and the importance of incorporating contextual information. Our simulation results show that cooperative MA architectures converge faster than their SA counterparts, as agents operate over smaller action spaces. Another key insight is that contextual MAB algorithms consistently outperform non-contextual ones, highlighting the value of leveraging side information in action selection. Moreover, in multi-player settings, results demonstrate that decentralized learners can achieve implicit coordination, although their greediness may degrade coexisting networks' performance and induce policy-chasing dynamics. Overall, these findings demonstrate that (contextual) MAB-based learning offers a practical and adaptive alternative to static IEEE 802.11 protocols, enabling more efficient and intelligent spectrum utilization.
