Table of Contents
Fetching ...

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

Abstract

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ω(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ω(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(\log T)$ bits of memory and $\widetilde{O}(K)$ batches that achieves regret $\widetilde{O}(\sqrt{KT})$, which nearly matches our lower bound.

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Abstract

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in batches and has only bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret is achievable with bits of memory under fully adaptive interaction, and with a -independent -type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a -bit memory constraint must use at least batches to achieve near-minimax regret , even under adaptive grids. In particular, logarithmic memory rules out -independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with batches and bits of memory allows only bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using bits of memory and batches that achieves regret , which nearly matches our lower bound.
Paper Structure (53 sections, 16 theorems, 186 equations, 1 algorithm)

This paper contains 53 sections, 16 theorems, 186 equations, 1 algorithm.

Key Result

Theorem 1

For sufficiently large $T$, any $B$-batch stochastic bandit algorithm with $W$ bits of persistent memory and near-minimax regret $\widetilde{O}(\sqrt{KT})$ must satisfy even if the batch grid is allowed to be adaptive.

Theorems & Definitions (36)

  • Theorem 1: Informal
  • Theorem 2: Informal
  • Remark 3
  • Theorem \ref{thm:informal-lower}: Formal
  • Definition \ref{thm:informal-lower}: Bernoulli hard family and thresholded sampling profile
  • Lemma \ref{thm:informal-lower}: Per-good-arm exploration
  • Definition \ref{thm:informal-lower}: $\chi^2$-divergence
  • Lemma \ref{thm:informal-lower}: Budget-event measurability
  • proof
  • Proposition \ref{thm:informal-lower}: Prefix-measurable $\chi^2$ change of measure
  • ...and 26 more