Multi-Agent Lipschitz Bandits

Sourav Chakraborty; Amit Kiran Rege; Claire Monteleoni; Lijun Chen

Multi-Agent Lipschitz Bandits

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

TL;DR

A modular protocol is proposed that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into independent single-player Lipschitz bandits, and it extends to general distance-threshold collision models.

Abstract

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, with coordination costs that are independent of the time horizon $T$. We propose a modular protocol that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into $N$ independent single-player Lipschitz bandits. We establish a near-optimal regret bound of $\tilde{O}(T^{(d+1)/(d+2)})$ plus a $T$-independent coordination cost, matching the single-player rate. To our knowledge, this is the first framework providing such guarantees, and it extends to general distance-threshold collision models.

Multi-Agent Lipschitz Bandits

TL;DR

Abstract

. We propose a modular protocol that first solves the multi-agent coordination problem -- identifying and seating players on distinct high-value regions via a novel maxima-directed search -- and then decouples the problem into

independent single-player Lipschitz bandits. We establish a near-optimal regret bound of

plus a

-independent coordination cost, matching the single-player rate. To our knowledge, this is the first framework providing such guarantees, and it extends to general distance-threshold collision models.

Paper Structure (43 sections, 35 theorems, 115 equations)

This paper contains 43 sections, 35 theorems, 115 equations.

Introduction
Related Work
Preliminaries
Stochastic Multi-Armed Bandits
Lipschitz Bandits in Continuous Domains
Cooperative Multi-Player Bandits and Collisions
Problem Setup and Benchmarks
Actions, Rewards, and Lipschitz Structure
A Tractable Collision Model for Continuous Spaces
Performance Benchmark and Objective
Our Approach: A Multi-Phase Decentralized Protocol
Phase I: Coarse Identification
Phase I Protocol
Phase I Guarantees
Phase II: Zooming in on cells
...and 28 more sections

Key Result

Lemma 6.1

For any $\eta\in(0,1)$, with probability at least $1-\delta_{I}/2$, the success count for every player $j\in[N]$ and every cell $C\in \mathcal{P}$ is bounded by $(1\pm\eta) T_0 p_K$ provided $T_0 \ge \frac{3}{\eta^2 p_K} \log\left(\frac{4 N K}{\delta_I}\right).$.

Theorems & Definitions (62)

Lemma 6.1: Success Counts Under Collisions
Lemma 6.2: Anytime Concentration for Center Means
Proposition 6.3: Phase-I Maxima Brackets
Lemma 7.1: Phase-II Probe Coverage
Proposition 7.2: Refined Maxima Brackets
Theorem 7.3: Gap-Free $\varepsilon$-Optimality
Definition 7.4: $\varepsilon$-Uniqueness at the Top-$N$
Lemma 7.5: Consensus under $\varepsilon$-Uniqueness
Example 7.6: Center-vs-maximum pathology in 1D
Theorem 8.1: Expected Seating Time
...and 52 more

Multi-Agent Lipschitz Bandits

TL;DR

Abstract

Multi-Agent Lipschitz Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (62)