Table of Contents
Fetching ...

A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

Jingwen Tong, Xinran Li, Liqun Fu, Jun Zhang, Khaled B. Letaief

TL;DR

The paper tackles learning unknown dynamics in cooperative resource allocation by formulating a multi-agent online restless MAB problem and proposing a Federated Online RMAB framework. Central to the approach is FedTSWI, which combines federated Thompson sampling for dynamic estimation with a Whittle-index policy to guide arm selection, achieving privacy and communication efficiency. A regret bound of $\mathcal{R}(T)=\mathcal{O}(\sqrt{T \log T})$ is derived, and the framework is validated in an online multi-user multi-channel access case, showing fast convergence and improved sample efficiency as the number of agents grows. The results demonstrate a practical, scalable solution for dynamic spectrum access and similar distributed resource allocation problems with unknown system dynamics.

Abstract

Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of $\mathcal{O}(\sqrt{T\log(T)})$ and better performance compared with baselines. More importantly, its sample complexity decreases with the number of agents.

A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

TL;DR

The paper tackles learning unknown dynamics in cooperative resource allocation by formulating a multi-agent online restless MAB problem and proposing a Federated Online RMAB framework. Central to the approach is FedTSWI, which combines federated Thompson sampling for dynamic estimation with a Whittle-index policy to guide arm selection, achieving privacy and communication efficiency. A regret bound of is derived, and the framework is validated in an online multi-user multi-channel access case, showing fast convergence and improved sample efficiency as the number of agents grows. The results demonstrate a practical, scalable solution for dynamic spectrum access and similar distributed resource allocation problems with unknown system dynamics.

Abstract

Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of and better performance compared with baselines. More importantly, its sample complexity decreases with the number of agents.
Paper Structure (27 sections, 6 theorems, 67 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 6 theorems, 67 equations, 13 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

For the single-armed bandit process, which can also be viewed as a POMDP, the belief state at the next time slot is updated by where $\eta = {1}/{\sum_{s'\in \mathcal{S}} \sum_{s\in \mathcal{S}} \hat{\theta}_l(s'|s, a) b(s)}$ is a normalizing constant, and $h =s$ is the observed true state.

Figures (13)

  • Figure 1: Multi-agent RMAB for cooperative resource allocation.
  • Figure 2: The relationship between different problem formulations.
  • Figure 3: An illustration of the federated online RMAB framework.
  • Figure 4: The diagram of the FedTSWI algorithm.
  • Figure 5: The two-state Markov chain of arm $n$.
  • ...and 8 more figures

Theorems & Definitions (20)

  • Definition 1
  • Proposition 1
  • Proof
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Lemma 1
  • Proof
  • ...and 10 more