Table of Contents
Fetching ...

Heterogeneous Multi-agent Multi-armed Bandits on Stochastic Block Models

Mengfan Xu, Liren Shan, Fatemeh Ghaffari, Xuchuang Wang, Xutong Liu, Mohammad Hajiesmaili

TL;DR

This work derives optimal instance-dependent regret upper bounds of order logT under sub-Gaussian rewards, which capture the degree of heterogeneity in the system, exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities.

Abstract

We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models, influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on stochastic block models - a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. In addition, the cluster structure in stochastic block model also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity, and rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities. In contrast, prior works have not accounted for this refined problem complexity, rely on more stringent assumptions, and exhibit limited scalability.

Heterogeneous Multi-agent Multi-armed Bandits on Stochastic Block Models

TL;DR

This work derives optimal instance-dependent regret upper bounds of order logT under sub-Gaussian rewards, which capture the degree of heterogeneity in the system, exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities.

Abstract

We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models, influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on stochastic block models - a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. In addition, the cluster structure in stochastic block model also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity, and rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities. In contrast, prior works have not accounted for this refined problem complexity, rely on more stringent assumptions, and exhibit limited scalability.

Paper Structure

This paper contains 52 sections, 19 theorems, 9 equations, 1 figure, 4 tables.

Key Result

Theorem 1

Executing the above algorithm leads to $\mathbb{E}[R_T] \leqslant O\left(\sum_{k\neq k^*} \frac{\log T}{M\Delta_k} + \frac{K}{p^{M^2}} \right).~(1)$

Figures (1)

  • Figure 1: The regret of different methods across different settings

Theorems & Definitions (26)

  • Definition 1: Stochastic Block Models
  • Definition 2: Degree of Heterogeneity
  • Remark
  • Theorem 1
  • Theorem 2
  • Definition 3
  • Theorem 3
  • Lemma 4
  • Theorem 5
  • Definition 4: Composition of graphs
  • ...and 16 more