Performance Evaluation of Multi-Armed Bandit Algorithms for Wi-Fi Channel Access
Miguel Casasnovas, Francesc Wilhelmi, Richard Combes, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Anders Jonsson, Luis Esteve, Boris Bellalta
TL;DR
This work tackles adaptive Wi-Fi channel access by framing it as a (C)MAB problem to tune MAC parameters (primary channel, channel width, CW) in decentralized settings. It compares three action-selection paradigms (optimism-driven, unimodal, randomized) and two action-space formulations (joint vs factorial), across SP and MP, under SCB and DCB. A set of algorithms (UCB, LinUCB, OSUB) plus a new E-RLB method are evaluated via simulations using realistic MAC settings. Key findings: contextual and optimism-driven strategies deliver fastest adaptation and highest goodput; unimodal methods rely on correct structure; naive randomized exploration can destabilize learning in multi-agent settings; E-RLB offers low-complexity viable performance with caveats.
Abstract
The adoption of dynamic, self-learning solutions for real-time wireless network optimization has recently gained significant attention due to the limited adaptability of existing protocols. This paper investigates multi-armed bandit (MAB) strategies as a data-driven approach for decentralized, online channel access optimization in Wi-Fi, targeting dynamic channel access settings: primary channel, channel width, and contention window (CW) adjustment. Key design aspects are examined, including the adoption of joint versus factorial action spaces, the inclusion of contextual information, and the nature of the action-selection strategy (optimism-driven, unimodal, or randomized). State-of-the-art algorithms and a proposed lightweight contextual approach, E-RLB, are evaluated through simulations. Results show that contextual and optimism-driven strategies consistently achieve the highest performance and fastest adaptation under recurrent conditions. Unimodal structures require careful graph construction to ensure that the unimodality assumption holds. Randomized exploration, adopted in the proposed E-RLB, can induce disruptive parameter reallocations, especially in multi-player settings. Decomposing the action space across several specialized agents accelerates convergence but increases sensitivity to randomized exploration and demands coordination under shared rewards to avoid correlated learning. Finally, despite its inherent inefficiencies from epsilon-greedy exploration, E-RLB demonstrates effective adaptation and learning, highlighting its potential as a viable low-complexity solution for realistic dynamic deployments.
