Table of Contents
Fetching ...

Joint Power Allocation and Phase Shift Design for Stacked Intelligent Metasurfaces-aided Cell-Free Massive MIMO Systems with MARL

Yiyang Zhu, Jiayi Zhang, Enyu Shi, Ziheng Liu, Chau Yuen, Bo Ai

TL;DR

This work tackles the non-convex problem of maximizing downlink sum spectral efficiency in SIM-aided cell-free mMIMO by jointly optimizing AP power allocation and SIM phase shifts. It proposes a novel MARL framework, NVR-MAPPO, which combines noisy value exploration with a recurrent policy under centralized training and decentralized execution to enhance exploration and convergence. The approach leverages a two-layer MAPPO-based network with agent-specific and global observations, and introduces a centralized value function with noise to stabilize learning. Experiments show substantial gains over codebook-based baselines and faster convergence than MADDPG, demonstrating the practicality and robustness of SIM-aided CF mMIMO for future wireless networks.

Abstract

Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems offer high spectral efficiency (SE) through multiple distributed access points (APs). However, the large number of antennas increases power consumption. We propose incorporating stacked intelligent metasurfaces (SIM) into CF mMIMO systems as a cost-effective, energy-efficient solution. This paper focuses on optimizing the joint power allocation of APs and the phase shift of SIMs to maximize the sum SE. To address this complex problem, we introduce a fully distributed multi-agent reinforcement learning (MARL) algorithm. Our novel algorithm, the noisy value method with a recurrent policy in multi-agent policy optimization (NVR-MAPPO), enhances performance by encouraging diverse exploration under centralized training and decentralized execution. Simulations demonstrate that NVR-MAPPO significantly improves sum SE and robustness across various scenarios.

Joint Power Allocation and Phase Shift Design for Stacked Intelligent Metasurfaces-aided Cell-Free Massive MIMO Systems with MARL

TL;DR

This work tackles the non-convex problem of maximizing downlink sum spectral efficiency in SIM-aided cell-free mMIMO by jointly optimizing AP power allocation and SIM phase shifts. It proposes a novel MARL framework, NVR-MAPPO, which combines noisy value exploration with a recurrent policy under centralized training and decentralized execution to enhance exploration and convergence. The approach leverages a two-layer MAPPO-based network with agent-specific and global observations, and introduces a centralized value function with noise to stabilize learning. Experiments show substantial gains over codebook-based baselines and faster convergence than MADDPG, demonstrating the practicality and robustness of SIM-aided CF mMIMO for future wireless networks.

Abstract

Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems offer high spectral efficiency (SE) through multiple distributed access points (APs). However, the large number of antennas increases power consumption. We propose incorporating stacked intelligent metasurfaces (SIM) into CF mMIMO systems as a cost-effective, energy-efficient solution. This paper focuses on optimizing the joint power allocation of APs and the phase shift of SIMs to maximize the sum SE. To address this complex problem, we introduce a fully distributed multi-agent reinforcement learning (MARL) algorithm. Our novel algorithm, the noisy value method with a recurrent policy in multi-agent policy optimization (NVR-MAPPO), enhances performance by encouraging diverse exploration under centralized training and decentralized execution. Simulations demonstrate that NVR-MAPPO significantly improves sum SE and robustness across various scenarios.

Paper Structure

This paper contains 17 sections, 11 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: The SIM-aided CF mMIMO system.
  • Figure 2: The proposed MARL precoding network.
  • Figure 3: Average reward against the training episode with $episode$ = 250, $L$ = 8, $K$ = 4, $M_{AP}$ = 2, $M$ = 4, and $N$ = 64.
  • Figure 4: sum SE against the number of SIM meta-surface layers with $L$ = 8, $K$ = 6, $M_{AP}$ = 2, and $N$ = 32.
  • Figure 5: sum SE against the number of SIM meta-atoms of each SIM meta-surface layer with $L$ = 8, $K$ = 6, $M_{AP}$ = 2, and $M$ = 8.