On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization

Yixuan Zhang; Ruihao Zhu; Qiaomin Xie

On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization

Yixuan Zhang, Ruihao Zhu, Qiaomin Xie

Abstract

Motivated by the principle of satisficing in decision-making, we study satisficing regret guarantees for nonstationary $K$-armed bandits. We show that in the general realizable, piecewise-stationary setting with $L$ stationary segments, the optimal regret is $Θ(L\log T)$ as long as $L\geq 2$. This stands in sharp contrast to the case of $L=1$ (i.e., the stationary setting), where a $T$-independent $Θ(1)$ satisficing regret is achievable under realizability. In other words, the optimal regret has to scale with $T$ even if just a little nonstationarity presents. A key ingredient in our analysis is a novel Fano-based framework tailored to nonstationary bandits via a \emph{post-interaction reference} construction. This framework strictly extends the classical Fano method for passive estimation as well as recent interactive Fano techniques for stationary bandits. As a complement, we also discuss a special regime in which constant satisficing regret is again possible.

On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization

Abstract

Motivated by the principle of satisficing in decision-making, we study satisficing regret guarantees for nonstationary

-armed bandits. We show that in the general realizable, piecewise-stationary setting with

stationary segments, the optimal regret is

as long as

. This stands in sharp contrast to the case of

(i.e., the stationary setting), where a

-independent

satisficing regret is achievable under realizability. In other words, the optimal regret has to scale with

even if just a little nonstationarity presents. A key ingredient in our analysis is a novel Fano-based framework tailored to nonstationary bandits via a \emph{post-interaction reference} construction. This framework strictly extends the classical Fano method for passive estimation as well as recent interactive Fano techniques for stationary bandits. As a complement, we also discuss a special regime in which constant satisficing regret is again possible.

Paper Structure (38 sections, 6 theorems, 50 equations, 1 figure, 2 algorithms)

This paper contains 38 sections, 6 theorems, 50 equations, 1 figure, 2 algorithms.

Introduction
Our Contributions
Related Work
Nonstationary bandits.
Stationary satisficing regret.
Threshold-based pure exploration and other satisficing notions.
Notation
Problem Formulation
Nonstationary Bandits
Satisficing Regret.
Satisficing nonstationary bandits.
A Fano-Based Framework for Nonstationary Bandit Problems
Conditional Fano.
Classical Fano Reduction: Small Regret Implies Correct Identification
Post-Interaction Reference and Two Complementary Approaches
...and 23 more sections

Key Result

Theorem 1

Suppose $L\ge 3$ and $\Delta^2 T \ge L$. Then $\inf_{\pi \in \Pi}\sup_{\nu \in \mathcal{E}_{L,\Delta}}R_T^S(\pi;\nu)\gtrsim L\log (\Delta^2T/L)/\Delta.$

Figures (1)

Figure 1: Schematic illustration: for $L=3$ one can create a separated local window to enable a clean identification reduction; for $L=2$ the analogous separation fails (motivating the information-budget approach).

Theorems & Definitions (6)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Lemma 1
Lemma 2

On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization

Abstract

On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (6)