Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts

Zhuohua Li; Maoli Liu; Xiangxiang Dai; John C. S. Lui

Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts

Zhuohua Li, Maoli Liu, Xiangxiang Dai, John C. S. Lui

TL;DR

The paper tackles the open problem of eliminating restrictive data-diversity assumptions in online clustering of contextual bandits. It introduces two algorithmic approaches under stochastic contexts (UniCLUB and PhaseUniCLUB) and a smoothed-adversarial-context framework (SACLUB and SASCLUB), achieving regret bounds that match or approach the best prior results while requiring weaker conditions. The stochastic-context methods provide strong guarantees with known and unknown cluster gaps, and the smoothed setting removes the need for pure exploration, improving practicality. Extensive experiments on synthetic and real-world data validate the effectiveness and robustness of the proposed clustering-enabled solutions across diverse regimes.

Abstract

The contextual multi-armed bandit (MAB) problem is crucial in sequential decision-making. A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters, utilizing shared features to improve learning efficiency. However, existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters. As a result, their theoretical analyses require several strong assumptions about the "diversity" of contexts generated by the environment, leading to impractical settings, complicated analyses, and poor practical performance. Removing these assumptions has been a long-standing open problem in the clustering of bandits literature. In this paper, we provide two solutions to this open problem. First, following the i.i.d. context generation setting in existing studies, we propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification. Remarkably, our algorithms require substantially weaker assumptions while achieving regret bounds comparable to prior work. Second, inspired by the smoothed analysis framework, we propose a more practical setting that eliminates the requirement for i.i.d. context generation used in previous studies, thus enhancing the performance of existing algorithms for online clustering of bandits. Our technique can be applied to both graph-based and set-based clustering of bandits frameworks. Extensive evaluations on both synthetic and real-world datasets demonstrate that our proposed algorithms consistently outperform existing approaches.

Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts

TL;DR

Abstract

Paper Structure (29 sections, 24 theorems, 98 equations, 5 figures, 2 tables, 5 algorithms)

This paper contains 29 sections, 24 theorems, 98 equations, 5 figures, 2 tables, 5 algorithms.

Introduction
Related Work
Stochastic Context Setting
Problem Setting
Diversity Conditions in Previous Studies and Key Techniques
UniCLUB: Algorithm for the Case When γ Is Known
PhaseUniCLUB: Algorithm for the Case When γ Is Unknown
Smoothed Adversarial Context Setting
Problem Setting
Algorithms for Smoothed Adversarial Context Setting
Theoretical Analysis
Performance Evaluation
Experiment Setup
Datasets Generation and Preprocessing
Experiment Results
...and 14 more sections

Key Result

theorem 1

Under the stochastic context setting (Assumptions assumption:user-uniformness, assumption:well-separatedness, assumption:item-regularity) and assuming the cluster gap $\gamma$ is known, the expected regret of the UniCLUB (algo:clucb) satisfies:

Figures (5)

Figure 1: Comparison of cumulative regrets in the stochastic context setting.
Figure 2: Comparison of cumulative regrets under the smoothed adversarial context setting.
Figure 3: Comparison of cumulative regrets with different arm set sizes.
Figure 4: Comparison of cumulative regrets with different numbers of users.
Figure 5: Comparison of cumulative regrets with more baselines.

Theorems & Definitions (47)

Remark 1
Remark 2
Remark 3
theorem 1: Regret of
theorem 2: Regret of
Remark 4
theorem 3: Regret of
Remark 5
theorem 4: Regret of and
Remark 6
...and 37 more

Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts

TL;DR

Abstract

Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (47)