Joint QoS-Aware Scheduling and Precoding for Massive MIMO Systems via Deep Reinforcement Learning
Chih-Wei Huang, Yen-Cheng Chou, Hong-Yunn Chen, Cheng-Fu Chou
TL;DR
This work tackles QoS-aware joint resource management in massive MIMO by formulating the problem as a Markov decision process and solving it with deep reinforcement learning. It introduces a componentized action structure and action embedding to enable dynamic algorithm selection across user prioritization, antenna allocation, and hybrid precoding, guided by a QoS-integrated long-term utility. The proposed DDGP-based framework demonstrates improvements in the number of satisfied users (e.g., 7.2% and 12.5% over static and baselines) under demanding traffic mixes, validating its effectiveness and adaptability. The approach offers a scalable, cross-layer solution for next-generation networks where QoS requirements and massive resource flexibility coexist.
Abstract
The rapid development of mobile networks proliferates the demands of high data rate, low latency, and high-reliability applications for the fifth-generation (5G) and beyond (B5G) mobile networks. Concurrently, the massive multiple-input-multiple-output (MIMO) technology is essential to realize the vision and requires coordination with resource management functions for high user experiences. Though conventional cross-layer adaptation algorithms have been developed to schedule and allocate network resources, the complexity of resulting rules is high with diverse quality of service (QoS) requirements and B5G features. In this work, we consider a joint user scheduling, antenna allocation, and precoding problem in a massive MIMO system. Instead of directly assigning resources, such as the number of antennas, the allocation process is transformed into a deep reinforcement learning (DRL) based dynamic algorithm selection problem for efficient Markov decision process (MDP) modeling and policy training. Specifically, the proposed utility function integrates QoS requirements and constraints toward a long-term system-wide objective that matches the MDP return. The componentized action structure with action embedding further incorporates the resource management process into the model. Simulations show 7.2% and 12.5% more satisfied users against static algorithm selection and related works under demanding scenarios.
