Meta Clustering of Neural Bandits

Yikun Ban; Yunzhe Qi; Tianxin Wei; Lihui Liu; Jingrui He

Meta Clustering of Neural Bandits

Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He

TL;DR

An instance-dependent performance guarantee is provided for the proposed algorithm that withstands the adversarial context, and it is proved the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions.

Abstract

The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.

Meta Clustering of Neural Bandits

TL;DR

Abstract

rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.

Paper Structure (13 sections, 23 theorems, 46 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 23 theorems, 46 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Related Work
Problem: Clustering of Neural Bandits
Proposed Algorithm
Regret Analysis
Experiments
Conclusion
Proof Details of Theorem \ref{['theorem:main']}
Analysis for user-learner
Analysis for Meta-learner
Bridge Meta-learner and User-learner
Main Proof
Connections with Neural Tangent Kernel

Key Result

Theorem 5.1

Given the number of rounds $T$ and $\gamma$, for any $\delta \in (0, 1), R > 0$, suppose $m \geq \widetilde{\Omega} ( \text{poly}(T, L, R) \cdot Kn\log (1/\delta))$, $\eta_1 = \eta_2 = \frac{R^2}{\sqrt{m}}$, and $\mathbb{E}[|\mathcal{N}_{u_t}(\mathbf{x}_t)|] = \frac{n}{q}, t \in [T]$. Then, with p where $S^{\ast}_{TK} = \underset{ \theta \in B(\theta_0, R)}{\inf} \sum_{t=1}^{TK} \mathcal{L}_t(\

Figures (7)

Figure 1: Clustering and Meta Adaptation: Given $u_t$ and an arm $\mathbf{x}_{t,i}$, (1) M-CNB identifies cluster $\widehat{\mathcal{N}}_{u_t}(\mathbf{x}_{t,i})$, and then (2) meta-learner $\Theta_{t-1}$ rapidly adapt to this cluster, proceeding to (3) the UCB exploration.
Figure 2: Regret comparison on recommendation datasets.
Figure 3: Regret comparison on Mnist and Notmnist, Cifar10, EMNIST(Letter), and Shuttle.
Figure 4: Regret comparison on Mnist, Fashion-Mnist, Mushroom, and MagicTelescope.
Figure 5: Running time vs. Performance for all methods.
...and 2 more figures

Theorems & Definitions (28)

Definition 3.1: Relative Cluster
Definition 3.2: $\gamma$-gap
Theorem 5.1
Definition 5.2: NTK ntk2018neuralwang2021neural
Lemma 5.3
Lemma A.1
Lemma A.2
Lemma A.3
Lemma A.4: Almost Convexity
Lemma A.5: User Trajectory Ball
...and 18 more

Meta Clustering of Neural Bandits

TL;DR

Abstract

Meta Clustering of Neural Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (28)