Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

Yubo Zhang; Pedro Botelho; Trevor Gordon; Gil Zussman; Igor Kadota

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

Yubo Zhang, Pedro Botelho, Trevor Gordon, Gil Zussman, Igor Kadota

TL;DR

The paper addresses fair, decentralized spectrum sharing for $M$ source-destination pairs over $N$ bands by introducing Fair Share Reinforcement Learning (FSRL). It advances a fully decentralized architecture that combines augmented state representations, a DDQN+IQN distributional RL framework with risk control, and a fairness-driven reward to incentivize balanced resource use without inter-agent communication. Empirical results across 54 network settings, plus scenarios with jamming and ad-hoc topologies, show high throughput and strong Jain fairness improvements (up to $89.0\%$ in extreme cases and $48.1\%$ on average) compared to a baseline, with robustness to dynamic conditions. The work demonstrates the practicality of decentralized, fairness-aware learning for dynamic spectrum access with minimal coordination, offering significant implications for scalable spectrum sharing in future wireless networks.

Abstract

We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL (FSRL) solution combines: (i) state augmentation with a semi-adaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain's fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average.

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

TL;DR

Abstract

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)