Table of Contents
Fetching ...

Multi-Agent Actor-Critic with Harmonic Annealing Pruning for Dynamic Spectrum Access Systems

George Stamatelis, Angelos-Nikolaos Kanatas, George C. Alexandropoulos

TL;DR

The paper tackles the problem of deploying multi-agent deep reinforcement learning for Dynamic Spectrum Access (DSA) on edge devices with limited resources. It introduces a sparse recurrent MARL framework under the Independent Actor with Global Critic (IAGC) paradigm, augmented by a novel harmonic annealing pruning scheduler that allows weight regrowth. The approach yields high-sparsity networks (up to 95%) that often outperform dense baselines and other pruning methods, across varied training conditions and scenarios including potential primary-user occupancy. This work enables efficient, scalable MADRL-based spectrum management on resource-constrained devices and suggests that dynamic sparsity can improve policy discovery and performance in decentralized, partially observable domains.

Abstract

Multi-Agent Deep Reinforcement Learning (MADRL) has emerged as a powerful tool for optimizing decentralized decision-making systems in complex settings, such as Dynamic Spectrum Access (DSA). However, deploying deep learning models on resource-constrained edge devices remains challenging due to their high computational cost. To address this challenge, in this paper, we present a novel sparse recurrent MARL framework integrating gradual neural network pruning into the independent actor global critic paradigm. Additionally, we introduce a harmonic annealing sparsity scheduler, which achieves comparable, and in certain cases superior, performance to standard linear and polynomial pruning schedulers at large sparsities. Our experimental investigation demonstrates that the proposed DSA framework can discover superior policies, under diverse training conditions, outperforming conventional DSA, MADRL baselines, and state-of-the-art pruning techniques.

Multi-Agent Actor-Critic with Harmonic Annealing Pruning for Dynamic Spectrum Access Systems

TL;DR

The paper tackles the problem of deploying multi-agent deep reinforcement learning for Dynamic Spectrum Access (DSA) on edge devices with limited resources. It introduces a sparse recurrent MARL framework under the Independent Actor with Global Critic (IAGC) paradigm, augmented by a novel harmonic annealing pruning scheduler that allows weight regrowth. The approach yields high-sparsity networks (up to 95%) that often outperform dense baselines and other pruning methods, across varied training conditions and scenarios including potential primary-user occupancy. This work enables efficient, scalable MADRL-based spectrum management on resource-constrained devices and suggests that dynamic sparsity can improve policy discovery and performance in decentralized, partially observable domains.

Abstract

Multi-Agent Deep Reinforcement Learning (MADRL) has emerged as a powerful tool for optimizing decentralized decision-making systems in complex settings, such as Dynamic Spectrum Access (DSA). However, deploying deep learning models on resource-constrained edge devices remains challenging due to their high computational cost. To address this challenge, in this paper, we present a novel sparse recurrent MARL framework integrating gradual neural network pruning into the independent actor global critic paradigm. Additionally, we introduce a harmonic annealing sparsity scheduler, which achieves comparable, and in certain cases superior, performance to standard linear and polynomial pruning schedulers at large sparsities. Our experimental investigation demonstrates that the proposed DSA framework can discover superior policies, under diverse training conditions, outperforming conventional DSA, MADRL baselines, and state-of-the-art pruning techniques.

Paper Structure

This paper contains 16 sections, 14 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The average sparsity level for the three different considered pruning schedulers for $I_{\rm T}=1000$ and $p_{\rm final}=0.95$.
  • Figure 2: Training curves for the proposed MADRL framework for all considered pruning schedulers, including the standard deviation across multiple random seeds (shaded region).