Deep Reinforcement Learning Enhanced Rate-Splitting Multiple Access for Interference Mitigation

Osman Nuri Irkicatal; Elif Tugce Ceran; Melda Yuksel

Deep Reinforcement Learning Enhanced Rate-Splitting Multiple Access for Interference Mitigation

Osman Nuri Irkicatal, Elif Tugce Ceran, Melda Yuksel

TL;DR

This work tackles interference mitigation in 6G-era networks by integrating Rate-Splitting Multiple Access (RSMA) with Deep Reinforcement Learning (DRL) for precoding and power allocation in a two-user, multi-antenna interference channel. It introduces a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework with centralized training and decentralized execution to optimize common and private data streams, while also incorporating decoding-order optimization and imperfect CSIT handling. The proposed approach demonstrates that MADDPG with RSMA can achieve the information-theoretic upper bound in single-antenna settings and closely approach bounds in multi-antenna scenarios, outperforming benchmarks like MRT, ZF, and leakage-based precoding. These results imply substantial practical gains in interference-limited networks, with RSMA and MADDPG offering a scalable, robust path toward higher spectral efficiency in 6G and beyond, even under channel estimation errors and decoding-order uncertainties.

Abstract

This study explores the application of the rate-splitting multiple access (RSMA) technique, vital for interference mitigation in modern communication systems. It investigates the use of precoding methods in RSMA, especially in complex multiple-antenna interference channels, employing deep reinforcement learning. The aim is to optimize precoders and power allocation for common and private data streams involving multiple decision-makers. A multi-agent deep deterministic policy gradient (MADDPG) framework is employed to address this complexity, where decentralized agents collectively learn to optimize actions in a continuous policy space. We also explore the challenges posed by imperfect channel side information at the transmitter. Additionally, decoding order estimation is addressed to determine the optimal decoding sequence for common and private data sequences. Simulation results demonstrate the effectiveness of the proposed RSMA method based on MADDPG, achieving the upper bound in single-antenna scenarios and closely approaching theoretical limits in multi-antenna scenarios. Comparative analysis shows superiority over other techniques such as MADDPG without rate-splitting, maximal ratio transmission (MRT), zero-forcing (ZF), and leakage-based precoding methods. These findings highlight the potential of deep reinforcement learning-driven RSMA in reducing interference and enhancing system performance in communication systems.

Deep Reinforcement Learning Enhanced Rate-Splitting Multiple Access for Interference Mitigation

TL;DR

Abstract

Paper Structure (17 sections, 28 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 28 equations, 10 figures, 2 tables, 1 algorithm.

Introduction
Related Literature
Contributions and Novelties
System Model
MADDPG for Precoding and Power Allocation Coefficients Optimization
Decoding Order Optimization
Imperfect Channel State Information
MADDPG with no Rate-Splitting
Complexity Analysis of the Proposed Algorithm
Benchmark Precoding Schemes
Maximum Ratio Transmission
Zero-Forcing
Leakage-based Precoding
Interference Channel Upper Bounds
No Interference
...and 2 more sections

Figures (10)

Figure 1: System and MADDPG with RSMA algorithm structure for MIMO interference channels. In the MADDPG algorithm, there is centralized training and distributed execution.
Figure 2: Average sum-rate achieved by MADDPG and the upper bound due to sato1981capacity and etkin2008gaussian for $M=1$ and $Q=1$. The MADDPG curves are obtained by averaging 25 runs, each having 200 time steps after the algorithm achieves convergence.
Figure 3: Average sum-rate achieved by MADDPG and the benchmark schemes for $M=3$ and $Q=1$. The MADDPG curves are obtained by averaging 50 runs, each having 1000 time steps after the algorithm achieves convergence.
Figure 4: Average sum-rate achieved by MADDPG and the benchmark schemes for $M=3$, and $Q=3$. The MADDPG curves are obtained by averaging 5 runs, each having 200 time steps after the algorithm achieves convergence.
Figure 5: Convergence curve achieved by MADDPG for $M = 3$ and $Q=1$ when $SNR=10$ dB. The convergence curve is obtained by averaging 50 runs, each having 1000 time steps given the number of training episodes.
...and 5 more figures

Deep Reinforcement Learning Enhanced Rate-Splitting Multiple Access for Interference Mitigation

TL;DR

Abstract

Deep Reinforcement Learning Enhanced Rate-Splitting Multiple Access for Interference Mitigation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)