Table of Contents
Fetching ...

Multi-Agent Lipschitz Bandits

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

TL;DR

A modular protocol is proposed that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into independent single-player Lipschitz bandits, and it extends to general distance-threshold collision models.

Abstract

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, with coordination costs that are independent of the time horizon $T$. We propose a modular protocol that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into $N$ independent single-player Lipschitz bandits. We establish a near-optimal regret bound of $\tilde{O}(T^{(d+1)/(d+2)})$ plus a $T$-independent coordination cost, matching the single-player rate. To our knowledge, this is the first framework providing such guarantees, and it extends to general distance-threshold collision models.

Multi-Agent Lipschitz Bandits

TL;DR

A modular protocol is proposed that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into independent single-player Lipschitz bandits, and it extends to general distance-threshold collision models.

Abstract

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, with coordination costs that are independent of the time horizon . We propose a modular protocol that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into independent single-player Lipschitz bandits. We establish a near-optimal regret bound of plus a -independent coordination cost, matching the single-player rate. To our knowledge, this is the first framework providing such guarantees, and it extends to general distance-threshold collision models.
Paper Structure (43 sections, 35 theorems, 115 equations)

This paper contains 43 sections, 35 theorems, 115 equations.

Key Result

Lemma 6.1

For any $\eta\in(0,1)$, with probability at least $1-\delta_{I}/2$, the success count for every player $j\in[N]$ and every cell $C\in \mathcal{P}$ is bounded by $(1\pm\eta) T_0 p_K$ provided $T_0 \ge \frac{3}{\eta^2 p_K} \log\left(\frac{4 N K}{\delta_I}\right).$.

Theorems & Definitions (62)

  • Lemma 6.1: Success Counts Under Collisions
  • Lemma 6.2: Anytime Concentration for Center Means
  • Proposition 6.3: Phase-I Maxima Brackets
  • Lemma 7.1: Phase-II Probe Coverage
  • Proposition 7.2: Refined Maxima Brackets
  • Theorem 7.3: Gap-Free $\varepsilon$-Optimality
  • Definition 7.4: $\varepsilon$-Uniqueness at the Top-$N$
  • Lemma 7.5: Consensus under $\varepsilon$-Uniqueness
  • Example 7.6: Center-vs-maximum pathology in 1D
  • Theorem 8.1: Expected Seating Time
  • ...and 52 more