Adaptive Sample Sharing for Multi Agent Linear Bandits
Hamza Cherkaoui, Merwan Barlier, Igor Colin
TL;DR
The paper addresses regret minimization in multi-agent linear bandits by introducing Bandit Adaptive Sample Sharing (BASS), a mechanism that adaptively shares samples based on a Mahalanobis-distance-based similarity to balance collaborative bias against uncertainty. It formalizes a separation-test and separation-time concept to determine when collaboration is beneficial, and provides thorough theoretical analysis including bounds on separation time, cumulative pseudo-regret during collaboration, network-level regret, and clustering-aware performance. The approach is validated with extensive experiments on synthetic and real datasets, showing significant improvement over state-of-the-art methods and the ability to recover cluster structure when present. Overall, BASS advances distributed bandit learning by enabling anisotropic, data-driven collaboration with rigorous guarantees and practical efficacy.
Abstract
The multi-agent linear bandit setting is a well-known setting for which designing efficient collaboration between agents remains challenging. This paper studies the impact of data sharing among agents on regret minimization. Unlike most existing approaches, our contribution does not rely on any assumptions on the bandit parameters structure. Our main result formalizes the trade-off between the bias and uncertainty of the bandit parameter estimation for efficient collaboration. This result is the cornerstone of the Bandit Adaptive Sample Sharing (BASS) algorithm, whose efficiency over the current state-of-the-art is validated through both theoretical analysis and empirical evaluations on both synthetic and real-world datasets. Furthermore, we demonstrate that, when agents' parameters display a cluster structure, our algorithm accurately recovers them.
