Distributed Linear Bandits under Communication Constraints

Sudeep Salgia; Qing Zhao

Distributed Linear Bandits under Communication Constraints

Sudeep Salgia, Qing Zhao

TL;DR

The paper addresses distributed linear bandits under fixed-capacity communication constraints, aiming to match centralized learning performance with minimal bit transmissions. It introduces Progressive Learning and Sharing (PLS), a bit-by-bit learning framework that interleaves exploration and information sharing to drive the sublinear regret order $R(T)=\tilde{O}(d\sqrt{MT})$ while keeping per-round communications at $\mathcal{O}(d)$ bits and total uplink/downlink costs of $\mathcal{O}(d/\alpha_0\;\log T)$ and $\mathcal{O}(d/\beta_0\; (\log M+\log T))$, respectively. The work also proves information-theoretic lower bounds, showing that these costs are tight up to logarithmic factors, thereby characterizing an optimal regret-communication frontier. A sparsity-aware variant, Sparse-PLS, further reduces both regret and uplink cost when $\theta^*$ is sparse, using a reduced exploration basis and LASSO estimation. Collectively, the results establish a practical, theoretically grounded framework for efficient distributed learning under communication constraints with broad implications for scalable multi-agent systems.

Abstract

We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink communications are carried over channels with fixed capacity, which limits the amount of information that can be transmitted in each use of the channels. We investigate the regret-communication trade-off by (i) establishing information-theoretic lower bounds on the required communications (in terms of bits) for achieving a sublinear regret order; (ii) developing an efficient algorithm that achieves the minimum sublinear regret order offered by centralized learning using the minimum order of communications dictated by the information-theoretic lower bounds. For sparse linear bandits, we show a variant of the proposed algorithm offers better regret-communication trade-off by leveraging the sparsity of the problem.

Distributed Linear Bandits under Communication Constraints

TL;DR

while keeping per-round communications at

bits and total uplink/downlink costs of

and

, respectively. The work also proves information-theoretic lower bounds, showing that these costs are tight up to logarithmic factors, thereby characterizing an optimal regret-communication frontier. A sparsity-aware variant, Sparse-PLS, further reduces both regret and uplink cost when

is sparse, using a reduced exploration basis and LASSO estimation. Collectively, the results establish a practical, theoretically grounded framework for efficient distributed learning under communication constraints with broad implications for scalable multi-agent systems.

Abstract

We consider distributed linear bandits where

agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink communications are carried over channels with fixed capacity, which limits the amount of information that can be transmitted in each use of the channels. We investigate the regret-communication trade-off by (i) establishing information-theoretic lower bounds on the required communications (in terms of bits) for achieving a sublinear regret order; (ii) developing an efficient algorithm that achieves the minimum sublinear regret order offered by centralized learning using the minimum order of communications dictated by the information-theoretic lower bounds. For sparse linear bandits, we show a variant of the proposed algorithm offers better regret-communication trade-off by leveraging the sparsity of the problem.

Paper Structure (38 sections, 10 theorems, 52 equations, 8 algorithms)

This paper contains 38 sections, 10 theorems, 52 equations, 8 algorithms.

Introduction
Main Results
Related Work
Problem Formulation
Progressive Learning and Sharing
The Basic Structure of PLS
Progressive Information Sharing
Progressive Collaborative Learning
Detailed Description of PLS
Progressive Collaborative Learning
Norm Estimation Stage:
Refinement Stage:
Setting Policy Parameters:
Progressive Information Sharing
Clipping and Quantization
...and 23 more sections

Key Result

Theorem 4.1

Consider the distributed stochastic linear bandit setting described in Sec. sec:problem_formulation. If PLS is run with parameters as described in Sec. sub:PLS_detailed with a budget of $T$ queries, then the following relation holds with probability at least $1 - \delta$, for some constant $C > 0$, independent of $d, M$ and $T$.

Theorems & Definitions (16)

Theorem 4.1
Lemma 4.2
Lemma 4.3
Theorem 4.4
Lemma 4.5
Remark 4.6
Remark 4.7
Theorem 5.1
Theorem 6.1
Lemma B.1
...and 6 more

Distributed Linear Bandits under Communication Constraints

TL;DR

Abstract

Distributed Linear Bandits under Communication Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (16)