Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Ruiyuan Huang; Zicheng Lyu; Xiaoyi Zhu; Zengfeng Huang

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

Abstract

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ω(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ω(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(\log T)$ bits of memory and $\widetilde{O}(K)$ batches that achieves regret $\widetilde{O}(\sqrt{KT})$, which nearly matches our lower bound.

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Abstract

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in

batches and has only

bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret

is achievable with

bits of memory under fully adaptive interaction, and with a

-independent

-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a

-bit memory constraint must use at least

batches to achieve near-minimax regret

, even under adaptive grids. In particular, logarithmic memory rules out

-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire

bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with

batches and

bits of memory allows only

bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using

bits of memory and

batches that achieves regret

, which nearly matches our lower bound.

Paper Structure (53 sections, 16 theorems, 186 equations, 1 algorithm)

This paper contains 53 sections, 16 theorems, 186 equations, 1 algorithm.

Introduction
Space constraint.
Adaptivity constraint.
Our contributions.
Technical Overview
A hard prior and the lower bound proof roadmap.
A thresholded sampling profile.
Why false-negative control is the main difficulty.
Why probability-level exploration requires localization.
From thresholded exploration to an $\Omega(K)$ information requirement.
From information to batches.
Upper bound intuition.
Organization.
Related Work
Limited adaptivity.
...and 38 more sections

Key Result

Theorem 1

For sufficiently large $T$, any $B$-batch stochastic bandit algorithm with $W$ bits of persistent memory and near-minimax regret $\widetilde{O}(\sqrt{KT})$ must satisfy even if the batch grid is allowed to be adaptive.

Theorems & Definitions (36)

Theorem 1: Informal
Theorem 2: Informal
Remark 3
Theorem \ref{thm:informal-lower}: Formal
Definition \ref{thm:informal-lower}: Bernoulli hard family and thresholded sampling profile
Lemma \ref{thm:informal-lower}: Per-good-arm exploration
Definition \ref{thm:informal-lower}: $\chi^2$-divergence
Lemma \ref{thm:informal-lower}: Budget-event measurability
proof
Proposition \ref{thm:informal-lower}: Prefix-measurable $\chi^2$ change of measure
...and 26 more

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Abstract

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (36)