Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse

Ziru Chen; Salvatore Talarico; Qing Xia; Xihan Peng; Xing Hao; Lin X. Cai

Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse

Ziru Chen, Salvatore Talarico, Qing Xia, Xihan Peng, Xing Hao, Lin X. Cai

Abstract

In next generation of Wi-Fi networks Multiple Access Point Coordination (MAPC) is poised to significantly enhance the network performance by enabling a set of Access Points (APs) to coordinate with each other through advanced coordinating schemes so that to reduce inter-AP contention and congestion. This paper focuses on defining a framework to facilitate the coordination across multi-APs when these employ Coordinated Spatial Reuse (C-SR). In this case, the coordinating APs may need to reciprocally adjust their scheduling strategy, power control and link adaptation to meet specific Quality of Service (QoS) requirements, which by using classical approaches leads to high overhead due to negotiations needed across APs, and requires complex solutions in order to properly optimize the network across all the parameters in play. In this matter, a two layer Multi-Armed Bandit (MAB) algorithm has been proposed to optimize such a network while preserving the fair use of resources across all nodes. The validity of this holistic approach is confirmed by system level simulations, which show that the proposed algorithm not only improves the network in terms of sum-throughput, but also allows to enhance fairness, making this a robust solution for next-generation of Wi-Fi networks.

Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse

Abstract

Paper Structure (14 sections, 10 equations, 5 figures, 2 tables)

This paper contains 14 sections, 10 equations, 5 figures, 2 tables.

Introduction
Related Work
System Model
Network Topology
MAPC Operation Framework
Problem Formulation
Weighted Sum
Proportional Fairness
Proposed Algorithm
Simulation Results
Convergence of the Two-Layer MAB Algorithm
Aggregated Sum Data Rate
Network Fairness
Conclusion

Figures (5)

Figure 1: Illustration of an indoor office layout where the AP are denoted by red dots and the non-AP STA are denoted by blue crosses.
Figure 2: Illustration of the MAPC framework.
Figure 3: Comparison between weighted sum and proportional sum fairness reward in terms of convergence of the algorithm in function of time.
Figure 4: Aggregated sum data rate over all AP in function of time.
Figure 5: Average throughput evaluated for each individual AP.

Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse

Abstract

Hierarchical Reinforcement Learning for Next Generation of Multi-AP Coordinated Spatial Reuse

Authors

Abstract

Table of Contents

Figures (5)