Decentralized Interference-Aware Codebook Learning in Millimeter Wave MIMO Systems
Yu Zhang, Ahmed Alkhateeb
TL;DR
This work addresses interference-aware codebook learning for mmWave MIMO in multi-cell networks where base stations operate asynchronously and cannot exchange information. It introduces a fully decentralized multi-agent reinforcement learning framework that uses a power-measurement averaging estimator to assess interference suppression and a decoupled reward to stabilize learning across nodes. The authors provide theoretical justification showing the averaging-based estimator is a sufficient statistic asymptotically in large antenna regimes, and they validate the approach with simulations demonstrating well-shaped learned codebooks that create deep nulls toward interference without inter-BS communication. The proposed method enables scalable, decentralized beam codebook design for dense mmWave networks, reducing coordination overhead while achieving substantial interference suppression and improved SIR distributions.
Abstract
Beam codebooks are integral components of the future millimeter wave (mmWave) multiple input multiple output (MIMO) system to relax the reliance on the instantaneous channel state information (CSI). The design of these codebooks, therefore, becomes one of the fundamental problems for these systems, and the well-designed codebooks play key roles in enabling efficient and reliable communications. Prior work has primarily focused on the codebook learning problem within a single cell/network and under stationary interference. In this work, we generalize the interference-aware codebook learning problem to networks with multiple cells/basestations. One of the key differences compared to the single-cell codebook learning problem is that the underlying environment becomes non-stationary, as the behavior of one base station will influence the learning of the others. Moreover, to encompass some of the challenging scenarios, information exchange between the different learning nodes is not allowed, which leads to a fully decentralized system with significantly increased learning difficulties. To tackle the non-stationarity, the averaging of the measurements is used to estimate the interference nulling performance of a particular beam, based on which a decision rule is provided. Furthermore, we theoretically justify the adoption of such estimator and prove that it is a sufficient statistic for the underlying quantity of interest in an asymptotic sense. Finally, a novel reward function based on averaging is proposed to fully decouple the learning of the multiple agents running at different nodes. Simulation results show that the developed solution is capable of learning well-shaped codebook patterns for different networks that significantly suppress the interference without information exchange, highlighting ...
