Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning for Multi-Cell Spectrum and Power Allocation

Yiming Zhang, Dongning Guo

TL;DR

The paper addresses scalable, low-latency radio resource allocation in dense multi-cell networks by casting the problem as a traffic-driven Dec-POMDP-IR and solving it with MAPPO using recurrent networks. It introduces two MARL-based solutions—fully distributed individual policies and a shared-policy variant—operating on local observations and neighborhood information to minimize average packet delay via a queue-length based reward. Empirical results show performance comparable to genie-aided centralized schemes (e.g., FP, WMMSE) with significantly lower execution times, and demonstrated robustness across network sizes and traffic conditions. The work offers a scalable framework for decentralized spectrum and power allocation applicable to conflict-graphs and cellular deployments, with potential for broad extension in resource allocation problems.

Abstract

This paper introduces a novel approach to radio resource allocation in multi-cell wireless networks using a fully scalable multi-agent reinforcement learning (MARL) framework. A distributed method is developed where agents control individual cells and determine spectrum and power allocation based on limited local information, yet achieve quality of service (QoS) performance comparable to centralized methods using global information. The objective is to minimize packet delays across devices under stochastic arrivals and applies to both conflict graph abstractions and cellular network configurations. This is formulated as a distributed learning problem, implementing a multi-agent proximal policy optimization (MAPPO) algorithm with recurrent neural networks and queueing dynamics. This traffic-driven MARL-based solution enables decentralized training and execution, ensuring scalability to large networks. Extensive simulations demonstrate that the proposed methods achieve comparable QoS performance to genie-aided centralized algorithms with significantly less execution time. The trained policies also exhibit scalability and robustness across various network sizes and traffic conditions.

Multi-Agent Reinforcement Learning for Multi-Cell Spectrum and Power Allocation

TL;DR

The paper addresses scalable, low-latency radio resource allocation in dense multi-cell networks by casting the problem as a traffic-driven Dec-POMDP-IR and solving it with MAPPO using recurrent networks. It introduces two MARL-based solutions—fully distributed individual policies and a shared-policy variant—operating on local observations and neighborhood information to minimize average packet delay via a queue-length based reward. Empirical results show performance comparable to genie-aided centralized schemes (e.g., FP, WMMSE) with significantly lower execution times, and demonstrated robustness across network sizes and traffic conditions. The work offers a scalable framework for decentralized spectrum and power allocation applicable to conflict-graphs and cellular deployments, with potential for broad extension in resource allocation problems.

Abstract

This paper introduces a novel approach to radio resource allocation in multi-cell wireless networks using a fully scalable multi-agent reinforcement learning (MARL) framework. A distributed method is developed where agents control individual cells and determine spectrum and power allocation based on limited local information, yet achieve quality of service (QoS) performance comparable to centralized methods using global information. The objective is to minimize packet delays across devices under stochastic arrivals and applies to both conflict graph abstractions and cellular network configurations. This is formulated as a distributed learning problem, implementing a multi-agent proximal policy optimization (MAPPO) algorithm with recurrent neural networks and queueing dynamics. This traffic-driven MARL-based solution enables decentralized training and execution, ensuring scalability to large networks. Extensive simulations demonstrate that the proposed methods achieve comparable QoS performance to genie-aided centralized algorithms with significantly less execution time. The trained policies also exhibit scalability and robustness across various network sizes and traffic conditions.
Paper Structure (24 sections, 27 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 27 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Examples of Dec-POMDP-IR model with three agents.
  • Figure 2: A conflict graph of 4 agents in a symmetric deployment.
  • Figure 3: Illustration of the timing of interactions between agents and environments.
  • Figure 4: A symmetric deployment with 4 APs and 8 devices.
  • Figure 5: Diagram of the training and execution workflow.
  • ...and 8 more figures