Table of Contents
Fetching ...

Coordinated Multi-Armed Bandits for Improved Spatial Reuse in Wi-Fi

Francesc Wilhelmi, Boris Bellalta, Szymon Szott, Katarzyna Kosek-Szott, Sergio Barrachina-Muñoz

TL;DR

This work addresses SR in MAPC-enabled Wi-Fi by proposing a coordinated Multi-Agent MAB framework that jointly configures OBSS/PD and transmit power across neighboring BSSs. Agents from multiple APs use action sets derived from discrete PD and power values, and learn via $\\varepsilon$-greedy or Thompson sampling strategies, with rewards computed through SELF and shared across agents using AVG, MAX-MIN, or PF under a MAPC communication model. The study demonstrates that coordination yields meaningful gains over OBSS/PD SR and uncoordinated approaches, notably improving minimum throughput and reducing maximum access delay in multi-BSS deployments; results also reveal trade-offs between exploration strategies and reward-sharing rules. Overall, AI-native SR with coordinated MA-MABs offers a scalable, performance-enhancing alternative to centralized C-SR, enabling fairer and more efficient spectrum reuse in future IEEE 802.11bn networks.

Abstract

Multi-Access Point Coordination (MAPC) and Artificial Intelligence and Machine Learning (AI/ML) are expected to be key features in future Wi-Fi, such as the forthcoming IEEE 802.11bn (Wi-Fi~8) and beyond. In this paper, we explore a coordinated solution based on online learning to drive the optimization of Spatial Reuse (SR), a method that allows multiple devices to perform simultaneous transmissions by controlling interference through Packet Detect (PD) adjustment and transmit power control. In particular, we focus on a Multi-Agent Multi-Armed Bandit (MA-MAB) setting, where multiple decision-making agents concurrently configure SR parameters from coexisting networks by leveraging the MAPC framework, and study various algorithms and reward-sharing mechanisms. We evaluate different MA-MAB implementations using Komondor, a well-adopted Wi-Fi simulator, and demonstrate that AI-native SR enabled by coordinated MABs can improve the network performance over current Wi-Fi operation: mean throughput increases by 15%, fairness is improved by increasing the minimum throughput across the network by 210%, while the maximum access delay is kept below 3 ms.

Coordinated Multi-Armed Bandits for Improved Spatial Reuse in Wi-Fi

TL;DR

This work addresses SR in MAPC-enabled Wi-Fi by proposing a coordinated Multi-Agent MAB framework that jointly configures OBSS/PD and transmit power across neighboring BSSs. Agents from multiple APs use action sets derived from discrete PD and power values, and learn via -greedy or Thompson sampling strategies, with rewards computed through SELF and shared across agents using AVG, MAX-MIN, or PF under a MAPC communication model. The study demonstrates that coordination yields meaningful gains over OBSS/PD SR and uncoordinated approaches, notably improving minimum throughput and reducing maximum access delay in multi-BSS deployments; results also reveal trade-offs between exploration strategies and reward-sharing rules. Overall, AI-native SR with coordinated MA-MABs offers a scalable, performance-enhancing alternative to centralized C-SR, enabling fairer and more efficient spectrum reuse in future IEEE 802.11bn networks.

Abstract

Multi-Access Point Coordination (MAPC) and Artificial Intelligence and Machine Learning (AI/ML) are expected to be key features in future Wi-Fi, such as the forthcoming IEEE 802.11bn (Wi-Fi~8) and beyond. In this paper, we explore a coordinated solution based on online learning to drive the optimization of Spatial Reuse (SR), a method that allows multiple devices to perform simultaneous transmissions by controlling interference through Packet Detect (PD) adjustment and transmit power control. In particular, we focus on a Multi-Agent Multi-Armed Bandit (MA-MAB) setting, where multiple decision-making agents concurrently configure SR parameters from coexisting networks by leveraging the MAPC framework, and study various algorithms and reward-sharing mechanisms. We evaluate different MA-MAB implementations using Komondor, a well-adopted Wi-Fi simulator, and demonstrate that AI-native SR enabled by coordinated MABs can improve the network performance over current Wi-Fi operation: mean throughput increases by 15%, fairness is improved by increasing the minimum throughput across the network by 210%, while the maximum access delay is kept below 3 ms.

Paper Structure

This paper contains 9 sections, 3 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Representation of an agent's operation in an OBSS, where the rewards associated with the different available actions are drawn from an unknown distribution.
  • Figure 2: Considered 2-BSS toy deployment and mean performance of $A_1=\{10,-72\}$ dBm, $A_2=\{10,-82\}$ dBm, $A_3=\{20,-72\}$ dBm, $A_4=\{20,-82\}$ dBm.
  • Figure 3: Average throughput observed in the toy scenario for OBSS/PD SR, uncoordinated bandit with $\varepsilon$-greedy, and coordinated bandit with $\varepsilon$-greedy (AVG).
  • Figure 4: Mean average throughput observed in the toy scenario for $\mathcal{E} = \{\varepsilon\text{-greedy}, \text{Thompson sampling}\}$ and $\mathcal{R} = \{\texttt{AVG}, \texttt{MAX-MIN}, \texttt{PF}\}$.
  • Figure 5: Average throughput observed in the toy scenario when applying $\varepsilon$-greedy (left) and Thompson sampling (right), for $\mathcal{R} = \{\texttt{AVG}, \texttt{MAX-MIN}, \texttt{PF}\}$.
  • ...and 3 more figures