Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach

Anouar Nechi; Rainer Buchty; Mladen Berekovic; Saleh Mulhem

Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach

Anouar Nechi, Rainer Buchty, Mladen Berekovic, Saleh Mulhem

Abstract

Millimeter-wave (mmWave) and terahertz (THz) massive MIMO systems often rely on predefined beamforming codebooks, which are usually suboptimal in Non-Line-of-Sight (NLoS) conditions and for hardware-limited transceivers. Reinforcement Learning (RL) enables adaptive, data-driven codebook design without explicit Channel State Information (CSI), but the robustness of such algorithms in practical conditions is underexplored. This paper introduces a robust multi-agent RL framework that learns beam codebooks directly from environmental feedback, eliminating the need for prior channel knowledge. Our method is well-suited for real-world deployments facing unpredictable propagation and hardware constraints. We conduct a comprehensive analysis of three off-policy algorithms, Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC), evaluating their resilience to hardware impairments and feedback noise. Simulations show that SAC consistently outperforms deterministic methods, achieving superior beamforming gains and stability in NLoS scenarios, even under severe impairments. These results demonstrate the promise of RL-based codebook design for robust mmWave/THz massive MIMO systems.

Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach

Abstract

Paper Structure (27 sections, 8 equations, 4 figures, 2 tables)

This paper contains 27 sections, 8 equations, 4 figures, 2 tables.

Introduction
The Role of Robustness Analysis in State-of-the-Art Beamforming
Paper Contribution
RL-based Beam Codebook Learning
RL-based Beamforming modeling
MDP Formulation
State & Action Spaces
Ternary Reward
Algorithm Selection
Deterministic Policies (DDPG & TD3)
Stochastic Policy (SAC)
Multi-Agent Codebook Learning
Channel Clustering
Cluster-Agent Assignment
Parallel Learning
...and 12 more sections

Figures (4)

Figure 1: Proposed Multi-Agent RL Framework. The architecture decomposes codebook design into clustering and assignment phases. Individual agents optimize beam patterns using DDPG, TD3, or SAC, where continuous actions are quantized via KNN to meet hardware constraints.
Figure 2: Average beamforming gain versus phase mismatch standard deviation $\sigma_p$ in the NLoS scenario across various codebook sizes.
Figure 3: Impact of feedback noise on beamforming gain in LoS (a) and NLoS (b) scenarios for 4-beams and 8-beams codebooks.
Figure 4: Evolution of exploration parameters in (a) LoS and (b) NLoS scenarios. The deterministic policies (DDPG/TD3) rely on fixed Ornstein-Uhlenbeck (OU) noise, whereas SAC adapts its entropy temperature ($\alpha$) to the environment.

Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach

Abstract

Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach

Authors

Abstract

Table of Contents

Figures (4)