A survey on multi-player bandits

Etienne Boursier; Vianney Perchet

A survey on multi-player bandits

Etienne Boursier, Vianney Perchet

TL;DR

This survey comprehensively organizes the multiplayer bandits literature, grounding it in cognitive radio applications and detailing a progression from centralized baselines to decentralized, communication-assisted methods. It highlights how collision information can be repurposed for coordination, yielding near-centralized regret in several settings, and surveys realistic extensions such as non-stationary rewards, varied collision models, and asynchronous operation. The authors identify gaps between theory and practice, emphasizing the need for realistic models and implementable algorithms for dynamic, uncertain networks. By connecting multiplayer bandits to related problems in multi-agent learning, matching markets, and queuing, the survey outlines fertile directions for future research with practical impact on IoT and wireless networks.

Abstract

Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.

A survey on multi-player bandits

TL;DR

Abstract

Paper Structure (33 sections, 9 theorems, 20 equations, 2 figures, 3 tables, 7 algorithms)

This paper contains 33 sections, 9 theorems, 20 equations, 2 figures, 3 tables, 7 algorithms.

Introduction
Motivation for cognitive radio networks
Baseline Results
Model
Centralized Case
Lower Bound
Reaching Centralized Optimal Regret
Coordination Routines
Enhancing Communication
Communication via Markov Chains.
Collision Information as Bits.
No communication
Towards Realistic Considerations
Non-stochastic Rewards
Markovian Rewards.
...and 18 more sections

Key Result

Theorem 2

For any uniformly good algorithm and all instances of homogeneous multiplayer bandits with $\mu_{(1)} > \ldots > \mu_{(K)}$, where $\mathrm{kl}\left(p,q\right) = p\log\left(\frac{p}{q}\right) + (1-p)\log\left( \frac{1-p}{1-q}\right)$.

Figures (2)

Figure 1: Example of bad Pareto optimal matching.
Figure : Rand Orthogonalisation

Theorems & Definitions (12)

Definition 1
Theorem 2: anantharam1987a
Theorem 3: wang2020
Definition 4
Theorem 5: gaitonde2020b
Definition 6
Theorem 7: lai1985
Theorem 8: auer1995gambling
Theorem 9: slivkins2019introduction
Theorem 10: auer2002
...and 2 more

A survey on multi-player bandits

TL;DR

Abstract

A survey on multi-player bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)