Deep Reinforcement Learning-Based Precoding for Multi-RIS-Aided Multiuser Downlink Systems with Practical Phase Shift
Po-Heng Chou, Bo-Ren Zheng, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang
TL;DR
The paper addresses joint transmitter precoding and RIS phase-shift design in a multi-RIS, multiuser downlink under a practical amplitude–phase coupling model to maximize the sum rate. It proposes a deep deterministic policy gradient (DDPG) framework that outputs continuous actions for the active and passive precoders and uses a reduced-state representation to cope with limited CSI. The method is evaluated in mmWave channels and shows that DDPG achieves higher sum rates than optimization-based methods and DDQN, with substantial gains from deploying multiple RISs and robustness when the number of UEs is random. The work highlights the feasibility of DRL for scalable RIS control under non-ideal hardware and mobility, and points to future extensions such as STAR-RIS MIMO with UE mobility. The key performance relation is $C_k = \log_2\left(1 + \frac{|\boldsymbol{h}_k \boldsymbol{w}_k|^2}{\sum_{i\neq k} |\boldsymbol{h}_k \boldsymbol{w}_i|^2 + \sigma^2}\right)$, optimized via $\max_{\boldsymbol{W}, \boldsymbol{\Theta}} \sum_{k} C_k$ under power and phase constraints, leveraging a practical RIS model $\boldsymbol{\Theta} = \mathrm{diag}(\beta_n e^{j\theta_n})$ with $\beta_n$ dependent on $\theta_n$. The results indicate meaningful performance gains in realistic mmWave scenarios and suggest avenues for future STAR-RIS work with mobility.
Abstract
This study considers multiple reconfigurable intelligent surfaces (RISs)-aided multiuser downlink systems with the goal of jointly optimizing the transmitter precoding and RIS phase shift matrix to maximize spectrum efficiency. Unlike prior work that assumed ideal RIS reflectivity, a practical coupling effect is considered between reflecting amplitude and phase shift for the RIS elements. This makes the optimization problem non-convex. To address this challenge, we propose a deep deterministic policy gradient (DDPG)-based deep reinforcement learning (DRL) framework. The proposed model is evaluated under both fixed and random numbers of users in practical mmWave channel settings. Simulation results demonstrate that, despite its complexity, the proposed DDPG approach significantly outperforms optimization-based algorithms and double deep Q-learning, particularly in scenarios with random user distributions.
