A survey on multi-player bandits
Etienne Boursier, Vianney Perchet
TL;DR
This survey comprehensively organizes the multiplayer bandits literature, grounding it in cognitive radio applications and detailing a progression from centralized baselines to decentralized, communication-assisted methods. It highlights how collision information can be repurposed for coordination, yielding near-centralized regret in several settings, and surveys realistic extensions such as non-stationary rewards, varied collision models, and asynchronous operation. The authors identify gaps between theory and practice, emphasizing the need for realistic models and implementable algorithms for dynamic, uncertain networks. By connecting multiplayer bandits to related problems in multi-agent learning, matching markets, and queuing, the survey outlines fertile directions for future research with practical impact on IoT and wireless networks.
Abstract
Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.
