Table of Contents
Fetching ...

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

Yubo Zhang, Pedro Botelho, Trevor Gordon, Gil Zussman, Igor Kadota

TL;DR

The paper addresses fair, decentralized spectrum sharing for $M$ source-destination pairs over $N$ bands by introducing Fair Share Reinforcement Learning (FSRL). It advances a fully decentralized architecture that combines augmented state representations, a DDQN+IQN distributional RL framework with risk control, and a fairness-driven reward to incentivize balanced resource use without inter-agent communication. Empirical results across 54 network settings, plus scenarios with jamming and ad-hoc topologies, show high throughput and strong Jain fairness improvements (up to $89.0\%$ in extreme cases and $48.1\%$ on average) compared to a baseline, with robustness to dynamic conditions. The work demonstrates the practicality of decentralized, fairness-aware learning for dynamic spectrum access with minimal coordination, offering significant implications for scalable spectrum sharing in future wireless networks.

Abstract

We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL (FSRL) solution combines: (i) state augmentation with a semi-adaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain's fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average.

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

TL;DR

The paper addresses fair, decentralized spectrum sharing for source-destination pairs over bands by introducing Fair Share Reinforcement Learning (FSRL). It advances a fully decentralized architecture that combines augmented state representations, a DDQN+IQN distributional RL framework with risk control, and a fairness-driven reward to incentivize balanced resource use without inter-agent communication. Empirical results across 54 network settings, plus scenarios with jamming and ad-hoc topologies, show high throughput and strong Jain fairness improvements (up to in extreme cases and on average) compared to a baseline, with robustness to dynamic conditions. The work demonstrates the practicality of decentralized, fairness-aware learning for dynamic spectrum access with minimal coordination, offering significant implications for scalable spectrum sharing in future wireless networks.

Abstract

We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL (FSRL) solution combines: (i) state augmentation with a semi-adaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain's fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average.

Paper Structure

This paper contains 12 sections, 18 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: DSA network with $3$ source-destination pairs denoted $\{1,2,3\}$ sharing $3$ bands denoted $\{1,2,3\}$. In each slot $t$, each source $m$ transmits in band $n$ or idles (i.e., "transmits" in band $0$). Successful transmissions are green. Collisions are red. Idle agents are white.
  • Figure 2: Architecture of each FSRL agent which integrates Dueling DQN with Distributional RL. Legend: $B$ is the batch size, $Q_d$ is the quantile dimension, $T$ is the temporal length, $D_h$ is the number of hidden units, and $D$ is the feature dimension.
  • Figure 3: Comparison of the transmissions from a FSRL agent in a network with $M=5$ agents and $N=5$ bands. (a) Shows an FSRL agent without the band-sharing term \ref{['eq:bandsharing']} in its reward \ref{['eq:reward']}. (b) Shows an FSRL agent with a reward as in \ref{['eq:reward']}.
  • Figure 4: Network throughput \ref{['eq:throughput_net']}, standard deviation of agent throughput \ref{['eq:throughput_std']}, and Jain's fairness index \ref{['eq:jain']} of FSRL associated with the last $W_t=500$ time slots in diverse network settings with $M\in\{2,\ldots,10\}$ source-destination pairs and $N\in\{1,\ldots, 10\}$ frequency bands, with $M \geq N$. Notably, FSRL uses the same ML architecture, reward structure, and hyper-parameters in all $54$ experiments.
  • Figure 5: Per agent throughput (or success rate) over time $t$ for three (out of the $54$) experiments displayed in Fig. \ref{['fig:overall_figure']}.
  • ...and 3 more figures