Coordinated Multi-Armed Bandits for Improved Spatial Reuse in Wi-Fi
Francesc Wilhelmi, Boris Bellalta, Szymon Szott, Katarzyna Kosek-Szott, Sergio Barrachina-Muñoz
TL;DR
This work addresses SR in MAPC-enabled Wi-Fi by proposing a coordinated Multi-Agent MAB framework that jointly configures OBSS/PD and transmit power across neighboring BSSs. Agents from multiple APs use action sets derived from discrete PD and power values, and learn via $\\varepsilon$-greedy or Thompson sampling strategies, with rewards computed through SELF and shared across agents using AVG, MAX-MIN, or PF under a MAPC communication model. The study demonstrates that coordination yields meaningful gains over OBSS/PD SR and uncoordinated approaches, notably improving minimum throughput and reducing maximum access delay in multi-BSS deployments; results also reveal trade-offs between exploration strategies and reward-sharing rules. Overall, AI-native SR with coordinated MA-MABs offers a scalable, performance-enhancing alternative to centralized C-SR, enabling fairer and more efficient spectrum reuse in future IEEE 802.11bn networks.
Abstract
Multi-Access Point Coordination (MAPC) and Artificial Intelligence and Machine Learning (AI/ML) are expected to be key features in future Wi-Fi, such as the forthcoming IEEE 802.11bn (Wi-Fi~8) and beyond. In this paper, we explore a coordinated solution based on online learning to drive the optimization of Spatial Reuse (SR), a method that allows multiple devices to perform simultaneous transmissions by controlling interference through Packet Detect (PD) adjustment and transmit power control. In particular, we focus on a Multi-Agent Multi-Armed Bandit (MA-MAB) setting, where multiple decision-making agents concurrently configure SR parameters from coexisting networks by leveraging the MAPC framework, and study various algorithms and reward-sharing mechanisms. We evaluate different MA-MAB implementations using Komondor, a well-adopted Wi-Fi simulator, and demonstrate that AI-native SR enabled by coordinated MABs can improve the network performance over current Wi-Fi operation: mean throughput increases by 15%, fairness is improved by increasing the minimum throughput across the network by 210%, while the maximum access delay is kept below 3 ms.
