The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Ronshee Chawla; Abishek Sankararaman; Ayalvadi Ganesh; Sanjay Shakkottai

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai

TL;DR

This work introduces GosInE, gossip-based insert-eliminate algorithms for decentralized multi-agent MABs where agents limit themselves to a small subset of arms and exchange only arm-IDs. The authors show that with a connected gossip network and a budget that grows at least logarithmically with time, each agent achieves a regret that scales like $O\left(\left(\sum_{j=2}^{\lceil K/N\rceil+2} \frac{1}{\Delta_j}\right) \alpha \ln T\right)$ up to a problem-dependent constant, effectively reducing regret by a factor of about $N$ compared to isolation. They provide a tight lower bound, analyze a network-conductance-driven second-order term, and demonstrate that higher connectivity and budget improve performance; asynchronous variants and initialization without agent IDs enhance practicality. Empirical results on synthetic and real data confirm that even modest collaboration yields substantial regret reductions, validating the fundamental benefit of minimal inter-agent learning.

Abstract

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $Ω(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

TL;DR

up to a problem-dependent constant, effectively reducing regret by a factor of about

compared to isolation. They provide a tight lower bound, analyze a network-conductance-driven second-order term, and demonstrate that higher connectivity and budget improve performance; asynchronous variants and initialization without agent IDs enhance practicality. Empirical results on synthetic and real data confirm that even modest collaboration yields substantial regret reductions, validating the fundamental benefit of minimal inter-agent learning.

Abstract

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of

agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate

times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order

smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

TL;DR

Abstract

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (47)