Table of Contents
Fetching ...

Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics

Xingyu Wang, Mengfan Xu

TL;DR

The paper tackles decentralized multi-agent multi-armed bandits under fully heavy-tailed dynamics in both graph connectivity and reward distributions, modeling graphs as rank-1 inhomogeneous random graphs with tail index $\alpha>1$ and rewards with potentially infinite variance. It introduces two algorithms, HT-HMUCB for homogeneous rewards and HT-HTUCB for heterogeneous rewards, leveraging hub-based information aggregation and a median-of-means estimator to cope with heavy tails, along with new information-delay bounds on sparse graphs. The authors prove regret bounds of $O\left(M^{1-\frac{1}{\alpha}}\log T\right)$ in the homogeneous setting for $\alpha\in(1,2)$ and $O\left(M\log T\right)$ in the heterogeneous setting, under realistic sparse, time-varying graph dynamics, thereby improving over prior work that assumes dense or light-tailed graphs. These results provide principled guidance for scalable, robust cooperative learning in networks with uneven communication resources and heavy-tailed stochasticity, with implications for large-scale distributed decision-making systems.

Abstract

We study decentralized multi-agent multi-armed bandits in fully heavy-tailed settings, where clients communicate over sparse random graphs with heavy-tailed degree distributions and observe heavy-tailed (homogeneous or heterogeneous) reward distributions with potentially infinite variance. The objective is to maximize system performance by pulling the globally optimal arm with the highest global reward mean across all clients. We are the first to address such fully heavy-tailed scenarios, which capture the dynamics and challenges in communication and inference among multiple clients in real-world systems. In homogeneous settings, our algorithmic framework exploits hub-like structures unique to heavy-tailed graphs, allowing clients to aggregate rewards and reduce noises via hub estimators when constructing UCB indices; under $M$ clients and degree distributions with power-law index $α> 1$, our algorithm attains a regret bound (almost) of order $O(M^{1 -\frac{1}α} \log{T})$. Under heterogeneous rewards, clients synchronize by communicating with neighbors, aggregating exchanged estimators in UCB indices; With our newly established information delay bounds on sparse random graphs, we prove a regret bound of $O(M \log{T})$. Our results improve upon existing work, which only address time-invariant connected graphs, or light-tailed dynamics in dense graphs and rewards.

Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics

TL;DR

The paper tackles decentralized multi-agent multi-armed bandits under fully heavy-tailed dynamics in both graph connectivity and reward distributions, modeling graphs as rank-1 inhomogeneous random graphs with tail index and rewards with potentially infinite variance. It introduces two algorithms, HT-HMUCB for homogeneous rewards and HT-HTUCB for heterogeneous rewards, leveraging hub-based information aggregation and a median-of-means estimator to cope with heavy tails, along with new information-delay bounds on sparse graphs. The authors prove regret bounds of in the homogeneous setting for and in the heterogeneous setting, under realistic sparse, time-varying graph dynamics, thereby improving over prior work that assumes dense or light-tailed graphs. These results provide principled guidance for scalable, robust cooperative learning in networks with uneven communication resources and heavy-tailed stochasticity, with implications for large-scale distributed decision-making systems.

Abstract

We study decentralized multi-agent multi-armed bandits in fully heavy-tailed settings, where clients communicate over sparse random graphs with heavy-tailed degree distributions and observe heavy-tailed (homogeneous or heterogeneous) reward distributions with potentially infinite variance. The objective is to maximize system performance by pulling the globally optimal arm with the highest global reward mean across all clients. We are the first to address such fully heavy-tailed scenarios, which capture the dynamics and challenges in communication and inference among multiple clients in real-world systems. In homogeneous settings, our algorithmic framework exploits hub-like structures unique to heavy-tailed graphs, allowing clients to aggregate rewards and reduce noises via hub estimators when constructing UCB indices; under clients and degree distributions with power-law index , our algorithm attains a regret bound (almost) of order . Under heterogeneous rewards, clients synchronize by communicating with neighbors, aggregating exchanged estimators in UCB indices; With our newly established information delay bounds on sparse random graphs, we prove a regret bound of . Our results improve upon existing work, which only address time-invariant connected graphs, or light-tailed dynamics in dense graphs and rewards.

Paper Structure

This paper contains 36 sections, 23 theorems, 148 equations, 4 algorithms.

Key Result

Lemma 1

Let Assumptions assumption: heavy-tailed graph and assumption: lower bound for h hold with $\alpha \in (1,2)$. Given $\zeta \in (0,2 - \alpha)$, there exists $\gamma > 0$ such that where $S_0 = \cap_{t \geq 1}S^t_0$.

Theorems & Definitions (45)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1: $\alpha \in (1,2)$
  • proof : Proof Sketch
  • Remark 1: Expected regret
  • Theorem 2: $\alpha > 1$
  • proof : Proof Sketch
  • Remark 2: Expected regret
  • Remark 3: Comparison to Theorem \ref{['thm:1a2']}
  • ...and 35 more