Table of Contents
Fetching ...

Distributed Linear Bandits under Communication Constraints

Sudeep Salgia, Qing Zhao

TL;DR

The paper addresses distributed linear bandits under fixed-capacity communication constraints, aiming to match centralized learning performance with minimal bit transmissions. It introduces Progressive Learning and Sharing (PLS), a bit-by-bit learning framework that interleaves exploration and information sharing to drive the sublinear regret order $R(T)=\tilde{O}(d\sqrt{MT})$ while keeping per-round communications at $\mathcal{O}(d)$ bits and total uplink/downlink costs of $\mathcal{O}(d/\alpha_0\;\log T)$ and $\mathcal{O}(d/\beta_0\; (\log M+\log T))$, respectively. The work also proves information-theoretic lower bounds, showing that these costs are tight up to logarithmic factors, thereby characterizing an optimal regret-communication frontier. A sparsity-aware variant, Sparse-PLS, further reduces both regret and uplink cost when $\theta^*$ is sparse, using a reduced exploration basis and LASSO estimation. Collectively, the results establish a practical, theoretically grounded framework for efficient distributed learning under communication constraints with broad implications for scalable multi-agent systems.

Abstract

We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink communications are carried over channels with fixed capacity, which limits the amount of information that can be transmitted in each use of the channels. We investigate the regret-communication trade-off by (i) establishing information-theoretic lower bounds on the required communications (in terms of bits) for achieving a sublinear regret order; (ii) developing an efficient algorithm that achieves the minimum sublinear regret order offered by centralized learning using the minimum order of communications dictated by the information-theoretic lower bounds. For sparse linear bandits, we show a variant of the proposed algorithm offers better regret-communication trade-off by leveraging the sparsity of the problem.

Distributed Linear Bandits under Communication Constraints

TL;DR

The paper addresses distributed linear bandits under fixed-capacity communication constraints, aiming to match centralized learning performance with minimal bit transmissions. It introduces Progressive Learning and Sharing (PLS), a bit-by-bit learning framework that interleaves exploration and information sharing to drive the sublinear regret order while keeping per-round communications at bits and total uplink/downlink costs of and , respectively. The work also proves information-theoretic lower bounds, showing that these costs are tight up to logarithmic factors, thereby characterizing an optimal regret-communication frontier. A sparsity-aware variant, Sparse-PLS, further reduces both regret and uplink cost when is sparse, using a reduced exploration basis and LASSO estimation. Collectively, the results establish a practical, theoretically grounded framework for efficient distributed learning under communication constraints with broad implications for scalable multi-agent systems.

Abstract

We consider distributed linear bandits where agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink communications are carried over channels with fixed capacity, which limits the amount of information that can be transmitted in each use of the channels. We investigate the regret-communication trade-off by (i) establishing information-theoretic lower bounds on the required communications (in terms of bits) for achieving a sublinear regret order; (ii) developing an efficient algorithm that achieves the minimum sublinear regret order offered by centralized learning using the minimum order of communications dictated by the information-theoretic lower bounds. For sparse linear bandits, we show a variant of the proposed algorithm offers better regret-communication trade-off by leveraging the sparsity of the problem.
Paper Structure (38 sections, 10 theorems, 52 equations, 8 algorithms)

This paper contains 38 sections, 10 theorems, 52 equations, 8 algorithms.

Key Result

Theorem 4.1

Consider the distributed stochastic linear bandit setting described in Sec. sec:problem_formulation. If PLS is run with parameters as described in Sec. sub:PLS_detailed with a budget of $T$ queries, then the following relation holds with probability at least $1 - \delta$, for some constant $C > 0$, independent of $d, M$ and $T$.

Theorems & Definitions (16)

  • Theorem 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Lemma 4.5
  • Remark 4.6
  • Remark 4.7
  • Theorem 5.1
  • Theorem 6.1
  • Lemma B.1
  • ...and 6 more