Table of Contents
Fetching ...

IEEE 802.11bn Multi-AP Coordinated Spatial Reuse with Hierarchical Multi-Armed Bandits

Maksymilian Wojnar, Wojciech Ciezobka, Katarzyna Kosek-Szott, Krzysztof Rusek, Szymon Szott, David Nunez, Boris Bellalta

TL;DR

The paper addresses scheduling AP--station pairs for Coordinated Spatial Reuse (C-SR) in dense IEEE 802.11bn networks to boost throughput. It introduces a hierarchical Multi-Armed Bandit (MAB) framework with two levels—level I selects which APs transmit, and level II assigns stations—to learn effective C-SR groupings online, with $UCB$ emerging as the most robust choice. A central controller deployment is analyzed, and rewards are defined by the total effective data rate across concurrent transmissions; the approach is evaluated against multiple MAB variants (including $\epsilon$-greedy, Thompson sampling, Softmax, and $UCB$) and baselines, showing rapid convergence and adaptability to topology changes. The results demonstrate the feasibility and benefits of ML-driven MAPC for 802.11bn, supported by an open-source simulator that facilitates further research in dense wireless networks.

Abstract

Coordination among multiple access points (APs) is integral to IEEE 802.11bn (Wi-Fi 8) for managing contention in dense networks. This letter explores the benefits of Coordinated Spatial Reuse (C-SR) and proposes the use of reinforcement learning to optimize C-SR group selection. We develop a hierarchical multi-armed bandit (MAB) framework that efficiently selects APs for simultaneous transmissions across various network topologies, demonstrating reinforcement learning's promise in Wi-Fi settings. Among several MAB algorithms studied, we identify the upper confidence bound (UCB) as particularly effective, offering rapid convergence, adaptability to changes, and sustained performance.

IEEE 802.11bn Multi-AP Coordinated Spatial Reuse with Hierarchical Multi-Armed Bandits

TL;DR

The paper addresses scheduling AP--station pairs for Coordinated Spatial Reuse (C-SR) in dense IEEE 802.11bn networks to boost throughput. It introduces a hierarchical Multi-Armed Bandit (MAB) framework with two levels—level I selects which APs transmit, and level II assigns stations—to learn effective C-SR groupings online, with emerging as the most robust choice. A central controller deployment is analyzed, and rewards are defined by the total effective data rate across concurrent transmissions; the approach is evaluated against multiple MAB variants (including -greedy, Thompson sampling, Softmax, and ) and baselines, showing rapid convergence and adaptability to topology changes. The results demonstrate the feasibility and benefits of ML-driven MAPC for 802.11bn, supported by an open-source simulator that facilitates further research in dense wireless networks.

Abstract

Coordination among multiple access points (APs) is integral to IEEE 802.11bn (Wi-Fi 8) for managing contention in dense networks. This letter explores the benefits of Coordinated Spatial Reuse (C-SR) and proposes the use of reinforcement learning to optimize C-SR group selection. We develop a hierarchical multi-armed bandit (MAB) framework that efficiently selects APs for simultaneous transmissions across various network topologies, demonstrating reinforcement learning's promise in Wi-Fi settings. Among several MAB algorithms studied, we identify the upper confidence bound (UCB) as particularly effective, offering rapid convergence, adaptability to changes, and sustained performance.
Paper Structure (5 sections, 3 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 5 sections, 3 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Example of C-SR with a central controller: after AP A wins channel access (becoming the sharing AP), the controller selects AP-station pairs (e.g., AP A to station 1 and AP B to station 4) for simultaneous transmission.
  • Figure 2: Example operation of the proposed hierarchical MAB scheme. Agents not taking part are omitted. The first level agent selects which other APs transmit. The two second level agents each select one of two recipient stations.
  • Figure 3: The agent's learning process when $\mathcal{P}_0$ is always $\{(A_1, S_1^1)\}$. Initially, the emphasis is on exploration to discover high-reward configurations. Over time, the agent gains confidence in which settings lead to high effective data rates and exploits them.
  • Figure 4: Example operation of proposed hierarchical MAB scheme with a central controller: (1) AP 1 notifies the controller that it is the sharing AP, (2) the controller provides the set $P$ of AP-station pairs to transmit in the next TXOP (which include APs 3 and 4), (3) after the TXOP all transmitting APs inform the controller how many frames were transmitted successfully.
  • Figure 5: Test scenario. In (a), crosses denote APs, dots -- stations, thick lines -- walls, APs are placed on the corners of a $d$-sided square while stations are placed $2m$ from their APs in ordinal directions. In (b), the effective data rates of the coordinated transmissions (number of concurrently transmitting APs) depend on the square side $d$, assuming a fixed MCS of 11. Three operational points studied in further simulations are denoted by vertical lines.
  • ...and 1 more figures