Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach
Hyeonho Noh, Harim Lee, Hyun Jong Yang
TL;DR
The paper tackles the challenge of jointly optimizing uplink USRA, MU-MIMO user selection, and MIMO mode selection for IEEE 802.11ax uplink OFDMA under unsaturated traffic. It introduces a tailored deep hierarchical reinforcement learning (DHRL) framework with a master agent selecting RU configurations and sub-agents performing MU-MIMO scheduling, enhanced by a two-branch network for CSI and buffers and a channel-subspace update to mitigate interference. Key contributions include a formal problem formulation with RU, MIMO MS, and buffer constraints, a DHQN architecture with reduced MU-MIMO action spaces, and complexity-aware training and evaluation showing substantial throughput gains across scenarios. The approach promises practical improvements for dense WLAN uplinks, particularly as bandwidth and antenna counts scale, by efficiently navigating the large joint action space without sacrificing generality.
Abstract
This letter tackles a joint user scheduling, frequency resource allocation (USRA), multi-input-multi-output mode selection (MIMO MS) between single-user MIMO and multi-user (MU) MIMO, and MU-MIMO user selection problem, integrating uplink orthogonal frequency division multiple access (OFDMA) in IEEE 802.11ax. Specifically, we focus on \textit{unsaturated traffic conditions} where users' data demands fluctuate. In unsaturated traffic conditions, considering packet volumes per user introduces a combinatorial problem, requiring the simultaneous optimization of MU-MIMO user selection and RA along the time-frequency-space axis. Consequently, dealing with the combinatorial nature of this problem, characterized by a large cardinality of unknown variables, poses a challenge that conventional optimization methods find nearly impossible to address. In response, this letter proposes an approach with deep hierarchical reinforcement learning (DHRL) to solve the joint problem. Rather than simply adopting off-the-shelf DHRL, we \textit{tailor} the DHRL to the joint USRA and MS problem, thereby significantly improving the convergence speed and throughput. Extensive simulation results show that the proposed algorithm achieves significantly improved throughput compared to the existing schemes under various unsaturated traffic conditions.
