A DRL Approach for RIS-Assisted Full-Duplex UL and DL Transmission: Beamforming, Phase Shift and Power Optimization

Nancy Nayak; Sheetal Kalyani; Himal A. Suraweera

A DRL Approach for RIS-Assisted Full-Duplex UL and DL Transmission: Beamforming, Phase Shift and Power Optimization

Nancy Nayak, Sheetal Kalyani, Himal A. Suraweera

TL;DR

The paper tackles RIS-assisted full-duplex wireless by eliminating the need for instantaneous CSI and exact residual SI knowledge. It introduces a two-stage learning framework: first, residual SI cancellation via least-squares or HSIC, and second, a DRL-based predictor (DDPG) that jointly optimizes RIS phase shifts, BS beamformers, and transmit powers to maximize the weighted UL/DL sum rate. The solution accommodates quantized RIS phase shifts to sharply reduce BS-to-RIS signaling, including a grouping scheme to further cut signaling, and demonstrates near CSI-based performance with substantial signaling reductions and faster convergence. The work highlights the robustness of the approach to moving users and CSI imperfections, and provides a clear pathway toward practical, low-overhead RIS-enabled FD systems with scalable complexity. Overall, the proposed MSF-DRL framework achieves strong performance without CSI, while allowing configurable signaling reductions through quantization and grouping.

Abstract

We propose a deep reinforcement learning (DRL) approach for a full-duplex (FD) transmission that predicts the phase shifts of the reconfigurable intelligent surface (RIS), base station (BS) active beamformers, and the transmit powers to maximize the weighted sum rate of uplink and downlink users. Existing methods require channel state information (CSI) and residual self-interference (SI) knowledge to calculate exact active beamformers or the DRL rewards, which typically fail without CSI or residual SI. Especially for time-varying channels, estimating and signaling CSI to the DRL agent is required at each time step and is costly. We propose a two-stage DRL framework with minimal signaling overhead to address this. The first stage uses the least squares method to initiate learning by partially canceling the residual SI. The second stage uses DRL to achieve performance comparable to existing CSI-based methods without requiring the CSI or the exact residual SI. Further, the proposed DRL framework for quantized RIS phase shifts reduces the signaling from BS to the RISs using $32$ times fewer bits than the continuous version. The quantized methods reduce action space, resulting in faster convergence and $7.1\%$ and $22.28\%$ better UL and DL rates, respectively than the continuous method.

A DRL Approach for RIS-Assisted Full-Duplex UL and DL Transmission: Beamforming, Phase Shift and Power Optimization

TL;DR

Abstract

times fewer bits than the continuous version. The quantized methods reduce action space, resulting in faster convergence and

and

better UL and DL rates, respectively than the continuous method.

Paper Structure (25 sections, 32 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 32 equations, 10 figures, 2 tables, 1 algorithm.

Introduction
System Model and Problem Formulation
Proposed two-stage learning method
First stage: residual SI-cancellation Method
Least square-based SI-cancellation
$\mathbf{H}_{AA}$ based SI cancellation
Second stage: Learning-based Method
Deep Reinforcement Learning based Predictor
Architecture of the Proposed Neural Action Predictor
RIS phases
Beamformers
Transmit powers
Proposed Quantized and Grouped Quantized MSF-DRL
Numerical Study and Discussion
RandPSBF
...and 10 more sections

Figures (10)

Figure 1: An FD communication scenario setup with one ULue, one DLue, and two RISs to facilitate communication when the users are not in LoS with the BS.
Figure 2: MDP formulation for RIS-based FD communication. The proposed algorithm can be deployed at the BS. The algorithm gives the action $\mathbf{a}^{\{t\}}$ (solid black) based on the state $\mathbf{s}^{\{t\}}$ (in dashed grey) generated at a previous time step. The environment reacts to these actions by sending the signal via RISs (dotted black) and returning the SINRs as the observations (in solid grey) indicate how good the actions are. Finally, the reward is calculated (dashed black) and fed as input to the learning agent. The SINR observations and actions at time step $t$ give the state for time step $(t+1)$.
Figure 3: The proposed action predictor network.
Figure 4: Rate evolution in the static UE scenario.
Figure 5: CDF of observed rates.
...and 5 more figures

A DRL Approach for RIS-Assisted Full-Duplex UL and DL Transmission: Beamforming, Phase Shift and Power Optimization

TL;DR

Abstract

A DRL Approach for RIS-Assisted Full-Duplex UL and DL Transmission: Beamforming, Phase Shift and Power Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (10)