Table of Contents
Fetching ...

Offline Clustering of Linear Bandits: The Power of Clusters under Limited Data

Jingyuan Liu, Zeyu Zhang, Xuchuang Wang, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili, Carlee Joe-Wong

TL;DR

This work tackles offline clustering in contextual linear bandits (Off-ClusBand), where a fixed offline dataset must support clustering users into unknown, heterogeneous groups. It introduces two algorithms: Off-C$^2$LUB, designed for insufficient offline data by building a similarity graph with a tunable threshold $\\hat{\\gamma}$ and aggregating only one-hop neighbors, and Off-CLUB, which assumes sufficient data and starts from a complete graph to prune inter-cluster edges. The authors provide rigorous suboptimality bounds that decompose noise and bias, along with strategies to select $\\hat{\\gamma}$ when the true gap $\\gamma$ is known or unknown, and prove a near-optimal lower bound for the problem. Extensive experiments on synthetic, Yelp, and MovieLens data show substantial improvements over baselines, validating both the practical utility and the theoretical guarantees of the proposed approach. The work advances offline decision-making under heterogeneity by exploiting cluster structure while carefully managing data limitations, with potential impact on personalized recommendations and offline medical or policy settings where online experiments are restricted.

Abstract

Contextual multi-armed bandit is a fundamental learning framework for making a sequence of decisions, e.g., advertising recommendations for a sequence of arriving users. Recent works have shown that clustering these users based on the similarity of their learned preferences can accelerate the learning. However, prior work has primarily focused on the online setting, which requires continually collecting user data, ignoring the offline data widely available in many applications. To tackle these limitations, we study the offline clustering of bandits (Off-ClusBand) problem, which studies how to use the offline dataset to learn cluster properties and improve decision-making. The key challenge in Off-ClusBand arises from data insufficiency for users: unlike the online case where we continually learn from online data, in the offline case, we have a fixed, limited dataset to work from and thus must determine whether we have enough data to confidently cluster users together. To address this challenge, we propose two algorithms: Off-C2LUB, which we show analytically and experimentally outperforms existing methods under limited offline user data, and Off-CLUB, which may incur bias when data is sparse but performs well and nearly matches the lower bound when data is sufficient. We experimentally validate these results on both real and synthetic datasets.

Offline Clustering of Linear Bandits: The Power of Clusters under Limited Data

TL;DR

This work tackles offline clustering in contextual linear bandits (Off-ClusBand), where a fixed offline dataset must support clustering users into unknown, heterogeneous groups. It introduces two algorithms: Off-CLUB, designed for insufficient offline data by building a similarity graph with a tunable threshold and aggregating only one-hop neighbors, and Off-CLUB, which assumes sufficient data and starts from a complete graph to prune inter-cluster edges. The authors provide rigorous suboptimality bounds that decompose noise and bias, along with strategies to select when the true gap is known or unknown, and prove a near-optimal lower bound for the problem. Extensive experiments on synthetic, Yelp, and MovieLens data show substantial improvements over baselines, validating both the practical utility and the theoretical guarantees of the proposed approach. The work advances offline decision-making under heterogeneity by exploiting cluster structure while carefully managing data limitations, with potential impact on personalized recommendations and offline medical or policy settings where online experiments are restricted.

Abstract

Contextual multi-armed bandit is a fundamental learning framework for making a sequence of decisions, e.g., advertising recommendations for a sequence of arriving users. Recent works have shown that clustering these users based on the similarity of their learned preferences can accelerate the learning. However, prior work has primarily focused on the online setting, which requires continually collecting user data, ignoring the offline data widely available in many applications. To tackle these limitations, we study the offline clustering of bandits (Off-ClusBand) problem, which studies how to use the offline dataset to learn cluster properties and improve decision-making. The key challenge in Off-ClusBand arises from data insufficiency for users: unlike the online case where we continually learn from online data, in the offline case, we have a fixed, limited dataset to work from and thus must determine whether we have enough data to confidently cluster users together. To address this challenge, we propose two algorithms: Off-C2LUB, which we show analytically and experimentally outperforms existing methods under limited offline user data, and Off-CLUB, which may incur bias when data is sparse but performs well and nearly matches the lower bound when data is sufficient. We experimentally validate these results on both real and synthetic datasets.

Paper Structure

This paper contains 38 sections, 13 theorems, 66 equations, 7 figures, 7 tables, 2 algorithms.

Key Result

Lemma 4.2

For inputs $\alpha \geq 1$, $\lambda > 0$, and $\delta \in (0,1)$ satisfying $\lambda \leq d \log(1 + \frac{\min_u\{N_u\}}{\lambda d}) + 2 \log(\frac{2U}{\delta}) \text{ and } \delta \leq \frac{2U}{1 + \max_u \{N_u\}/(\lambda d)},$ there exist some $\alpha_r\in[ \frac{\sqrt{\tilde{\lambda}_a}}{4(\al with probability at least $1-\delta$, where $N_{\min}$ is as defined in Line line_initialization_al

Figures (7)

  • Figure 1: Influence of Different $\hat{\gamma}$.
  • Figure 2: Comparisons of our algorithms with baselines under different user distributions and datasets. Off-C$^2$LUB and its variants consistently outperform Off-CLUB and other baselines.
  • Figure 3: Suboptimality under varying $\hat{\gamma}$, showing Off-C$^2$LUB is near-optimal with our overestimation or underestimation strategies.
  • Figure 4: Comparison of clustering strategies under $|\mathcal{D}|=30k$ across three datasets.
  • Figure 5: Comparison of clustering strategies under $|\mathcal{D}|=35k$ across three datasets.
  • ...and 2 more figures

Theorems & Definitions (33)

  • Remark 3.2: Discussion on \ref{['assumption_action_regularity']}
  • Definition 4.1: Heterogeneity Gap
  • Definition 4.1: Heterogeneity Gap
  • Lemma 4.2: Estimations of sets $\mathcal{R}_{\hat{\gamma}}(u)$ and $\mathcal{W}_{\hat{\gamma}}(u)$
  • Remark 4.3: Interpretation of \ref{['le_sizes_of_RandW']}
  • Theorem 4.4
  • Remark 4.5: Analysis of \ref{['th2']}
  • Remark 4.6: Discussions on $\alpha_r$ and $\alpha_w$
  • Corollary 4.7: Theorem \ref{['th2']} for Accurate $\gamma$
  • Remark 4.8: Analysis of Corollary \ref{['corollary_gamma_known']}
  • ...and 23 more