Online Clustering with Bandit Information

G Dhinesh Chandran; Srinivas Reddy Kota; Srikrishna Bhashyam

Online Clustering with Bandit Information

G Dhinesh Chandran, Srinivas Reddy Kota, Srikrishna Bhashyam

TL;DR

The paper addresses online clustering of $M$ Gaussian arms into $K$ clusters under a fixed-confidence setting, allowing different means within a cluster. It introduces ATBOC, a sample-efficient algorithm with average-tracking for arm pulls and a GLR-based stopping rule, and proves it is within a factor of 2 of a problem-dependent lower bound as $\delta\to0$, hence order-optimal. To reduce computation, LUCBBOC and BOC-Elim are proposed; LUCBBOC uses LUCB-style confidence gaps, while BOC-Elim targets the 1D case and top $K-1$ gaps, both offering $\delta$-PC guarantees with reduced complexity. Theoretical results are complemented by simulations on synthetic datasets and the MovieLens dataset, showing the proposed methods outperform existing max-gap baselines in many settings and demonstrating practical applicability to multi-cluster, multi-dimensional online clustering.

Abstract

We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting. In this multi-armed bandit problem, we have $M$ arms, each providing i.i.d. samples that follow a multivariate Gaussian distribution with an {\em unknown} mean and a known unit covariance. The arms are grouped into $K$ clusters based on the distance between their means using the Single Linkage (SLINK) clustering algorithm on the means of the arms. Since the true means are unknown, the objective is to obtain the above clustering of the arms with the minimum number of samples drawn from the arms, subject to an upper bound on the error probability. We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that this algorithm is order optimal, meaning that the upper bound on its expected sample complexity for given error probability $δ$ is within a factor of 2 of an instance-dependent lower bound as $δ\rightarrow 0$. Furthermore, we propose a computationally more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification. Simulation results demonstrate that the performance of LUCBBOC is comparable to that of ATBOC. We numerically assess the effectiveness of the proposed algorithms through numerical experiments on both synthetic datasets and the real-world MovieLens dataset. To the best of our knowledge, this is the first work on bandit online clustering that allows arms with different means in a cluster and $K$ greater than 2.

Online Clustering with Bandit Information

TL;DR

The paper addresses online clustering of

Gaussian arms into

clusters under a fixed-confidence setting, allowing different means within a cluster. It introduces ATBOC, a sample-efficient algorithm with average-tracking for arm pulls and a GLR-based stopping rule, and proves it is within a factor of 2 of a problem-dependent lower bound as

, hence order-optimal. To reduce computation, LUCBBOC and BOC-Elim are proposed; LUCBBOC uses LUCB-style confidence gaps, while BOC-Elim targets the 1D case and top

gaps, both offering

-PC guarantees with reduced complexity. Theoretical results are complemented by simulations on synthetic datasets and the MovieLens dataset, showing the proposed methods outperform existing max-gap baselines in many settings and demonstrating practical applicability to multi-cluster, multi-dimensional online clustering.

Abstract

We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting. In this multi-armed bandit problem, we have

arms, each providing i.i.d. samples that follow a multivariate Gaussian distribution with an {\em unknown} mean and a known unit covariance. The arms are grouped into

clusters based on the distance between their means using the Single Linkage (SLINK) clustering algorithm on the means of the arms. Since the true means are unknown, the objective is to obtain the above clustering of the arms with the minimum number of samples drawn from the arms, subject to an upper bound on the error probability. We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that this algorithm is order optimal, meaning that the upper bound on its expected sample complexity for given error probability

is within a factor of 2 of an instance-dependent lower bound as

. Furthermore, we propose a computationally more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification. Simulation results demonstrate that the performance of LUCBBOC is comparable to that of ATBOC. We numerically assess the effectiveness of the proposed algorithms through numerical experiments on both synthetic datasets and the real-world MovieLens dataset. To the best of our knowledge, this is the first work on bandit online clustering that allows arms with different means in a cluster and

greater than 2.

Paper Structure (30 sections, 20 theorems, 144 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 30 sections, 20 theorems, 144 equations, 6 figures, 1 table, 3 algorithms.

Introduction
System Model and Preliminaries
Clustering problem setup
Cluster distances
Separation assumption for clusters
Clustering algorithm and performance metric
Lower Bound
Average Tracking Bandit Online Clustering (ATBOC) Algorithm and its performance
Algorithm Description
ATBOC Algorithm Performance
Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC) Algorithm
BOC-Elim Algorithm and its performance
Simulations
Synthetic Dataset 1 - Asymptotic Behavior (d = 2)
Synthetic Dataset 2 - Asymptotic behavior (d=1)
...and 15 more sections

Key Result

Theorem 1

Let $\delta \in (0, 1)$. For any $\delta-$PC algorithm $\pi$ and any problem instance $\boldsymbol{\mu} \in \mathbb{R}^{d \times M}$, the expected sample complexity is lower bounded as, where Furthermore,

Figures (6)

Figure 1: Illustrative example to understand $\mathcal{C}(\boldsymbol{\mu})$. We have $d=1$, $K=3$, $M=5$, and mean vector $\boldsymbol{\mu} = [0, 0.5, 1.5, 2, 3.5]$. On using SLINK clustering algorithm, arms $1, 2$ will be assigned to cluster $1$; arms $3, 4$ will be assigned to cluster $2$, and arm $5$ will be assigned to cluster $3$. Hence, it outputs the cluster index vector, $\mathcal{C}(\boldsymbol{\mu}) = [1, 1, 2, 2, 3]$.
Figure 2: Performance of ATBOC and LUCBBOC ($M=4$, $K=2$, $d=2$).
Figure 3: Performance of ATBOC, LUCBBOC, and BOC-Elim ($M=7$, $K=3$, $d=1$).
Figure 4: Comparison of ATBOC, LUCBBOC, and BOC-Elim with MaxGapTop2UCB ($M=7$, $K=2$, $d=1$).
Figure 5: Simulation results on MovieLens Dataset ($M=5$, $K=3$, $d=1$).
...and 1 more figures

Theorems & Definitions (53)

Definition 1
Remark 1
Theorem 1
proof
Lemma 1
proof
Remark 2
Lemma 2
proof
Theorem 2
...and 43 more

Online Clustering with Bandit Information

TL;DR

Abstract

Online Clustering with Bandit Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (53)