Flickering Multi-Armed Bandits

Sourav Chakraborty; Amit Kiran Rege; Claire Monteleoni; Lijun Chen

Flickering Multi-Armed Bandits

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

TL;DR

This work proposes and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation, and establishes high-probability and expected sublinear regret bounds for both graph settings.

Abstract

We introduce Flickering Multi-Armed Bandits (FMAB), a new MAB framework where the set of available arms (or actions) can change at each round, and the available set at any time may depend on the agent's previously selected arm. We model this constrained, evolving availability using random graph processes, where arms are nodes and the agent's movement is restricted to its local neighborhood. We analyze this problem under two random graph models: an i.i.d. Erdős--Rényi (ER) process and an Edge-Markovian process. We propose and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds for both graph settings. We show that the exploration cost of our algorithm is near-optimal by establishing a matching information-theoretic lower bound for this problem class, highlighting the fundamental cost of exploration under local-move constraints. We complement our theoretical guarantees with numerical simulations, including a scenario of a robotic ground vehicle scouting a disaster-affected region.

Flickering Multi-Armed Bandits

TL;DR

Abstract

Paper Structure (114 sections, 41 theorems, 270 equations, 4 figures)

This paper contains 114 sections, 41 theorems, 270 equations, 4 figures.

Introduction
Contributions.
Related work.
Models and Problem Formulation
Erdős--Rényi (ER) Graph Process.
Markovian Edge-Flip Process.
A Phase-Based Learning Algorithm
Performance Guarantees
Regret for FMAB with i.i.d. Erdős--Rényi Graphs
Regret for FMAB with Edge-Markovian Graphs
The "Two Speeds" Challenge.
Burn-in and Exploration Phases.
Expected Regret for FMAB
Simulations
Application: Disaster Response Scenario
...and 99 more sections

Key Result

theorem 1

Consider the FMAB problem under the i.i.d. ER model (homogeneous or heterogeneous). For any failure probability $\delta \in (0,1)$, there exist absolute constants $c_1, c_2, c_3 > 0$ such that by setting the exploration length $T_0 \geq c_1 n \log(nT/\delta) + c_2 n \log(n/\delta)/\Delta_{\min}^2$,

Figures (4)

Figure 1: A 4-arm FMAB problem at $t=s, s+1, s+2$, with pull sequence $a_s=3, a_{s+1}=4, a_{s+2}=2$ (from $a_{s-1}=1$). (A) Problem view: Blue/White arms are accessible/inaccessible. Pulled arm has a dark blue border. (B) Graph view: Learner starts at the dark blue node ($a_{t-1}$) and can move to blue neighbors ($L_t(a_{t-1})$).
Figure 2: Average cumulative regret $R(t)/t$ over time.
Figure 3: Spatial visitation density (log-scale) at three representative stages of the mission. The heatmaps illustrate the algorithm's behavior transitioning from broad, near-uniform exploration (Rounds 1–100) to an emerging preference for good sites (Rounds 101–500), and finally to a sharp, focused exploitation of the optimal hotspot (Rounds 501–$T$).
Figure 4: (a) Average cumulative regret $R(t)/t$ over time for i.i.d ER case. The main plot and the inset (showing the initial $t \le 200$ rounds) demonstrate the algorithm's rapid convergence as it quickly learns to identify high-utility locations. (b) Same plot for the Edge-Markovian Case. (c) Box plot for the navigation cost for ER case with respect to the sparsity.

Theorems & Definitions (92)

theorem 1: Regret for i.i.d. ER Graphs
proof
remark 1: General Analysis and the Homogeneous Case
theorem 2: Regret for Edge-Markovian Graphs
proof
corollary 1: Expected Regret Bound
proof
remark 2: On the Near-Optimality of Exploration
lemma 1: ER Availability
proof
...and 82 more

Flickering Multi-Armed Bandits

TL;DR

Abstract

Flickering Multi-Armed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (92)