Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits

Kuan-Ta Li; Ping-Chun Hsieh; Yu-Chih Huang

Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits

Kuan-Ta Li, Ping-Chun Hsieh, Yu-Chih Huang

TL;DR

This work proposes a novel and generic exploration mechanism, called diminishing exploration, which eliminates the need for knowledge about change points and can be used in conjunction with an existing change detection-based algorithm to achieve near-optimal regret scaling.

Abstract

The piecewise-stationary bandit problem is an important variant of the multi-armed bandit problem that further considers abrupt changes in the reward distributions. The main theme of the problem is the trade-off between exploration for detecting environment changes and exploitation of traditional bandit algorithms. While this problem has been extensively investigated, existing works either assume knowledge about the number of change points $M$ or require extremely high computational complexity. In this work, we revisit the piecewise-stationary bandit problem from a minimalist perspective. We propose a novel and generic exploration mechanism, called diminishing exploration, which eliminates the need for knowledge about $M$ and can be used in conjunction with an existing change detection-based algorithm to achieve near-optimal regret scaling. Simulation results show that despite oblivious of $M$, equipping existing algorithms with the proposed diminishing exploration generally achieves better empirical regret than the traditional uniform exploration.

Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits

TL;DR

Abstract

or require extremely high computational complexity. In this work, we revisit the piecewise-stationary bandit problem from a minimalist perspective. We propose a novel and generic exploration mechanism, called diminishing exploration, which eliminates the need for knowledge about

and can be used in conjunction with an existing change detection-based algorithm to achieve near-optimal regret scaling. Simulation results show that despite oblivious of

, equipping existing algorithms with the proposed diminishing exploration generally achieves better empirical regret than the traditional uniform exploration.

Paper Structure (25 sections, 18 theorems, 80 equations, 8 figures, 1 table, 5 algorithms)

This paper contains 25 sections, 18 theorems, 80 equations, 8 figures, 1 table, 5 algorithms.

Introduction
Related Work
Problem Formulation
The Proposed Framework: Diminishing Exploration
Diminishing Exploration
Integrating Off-the-Shelf Change Detectors With Diminishing Exploration
Regret Analysis
Integration with change detectors
Extension to Detection of Optimal Arm Changes
Diminishing Exploration With a Skipping Mechanism
Integration with change detectors
Simulation Results
Concluding Remarks
Change Detection Algorithm
The Extended Version Algorithm
...and 10 more sections

Key Result

Theorem 4.1

The Algorithm alg:main_alg can be combined with a CD algorithm, which achieves the expected regret upper bound as follows: where $\tilde{C}_{i}=8\sum_{\Delta^{\left(i\right)}_{k}>0}\frac{\log T}{\Delta^{\left(i\right)}_{k}}+\left(\frac{5}{2}+\frac{\pi^{2}}{3}+K\right)\sum^{K}_{k=1}\Delta^{\left(i\right)}_{k}$.

Figures (8)

Figure 1: Regret and computation times.
Figure 2: Diminishing exploration.
Figure 3: Regret in the synthetic environments and under the Yahoo data set.
Figure 4: Regret and computation times.
Figure 5: This figure allows us to easily compare the notation in Section \ref{['sec:analysis']} and \ref{['sec:extension']}.
...and 3 more figures

Theorems & Definitions (32)

Theorem 4.1
Corollary 4.4: Regret bound of M-UCB
Corollary 4.6: Regret bound of GLR-UCB
Remark 5.1
Corollary 5.2: Regret bound of M-UCB
Corollary 5.3: Regret bound of GLR-UCB
Definition A.1
Lemma C.1: Diminishing exploration regret
proof
Lemma C.2: Samples-time steps transform
...and 22 more

Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits

TL;DR

Abstract

Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (32)