Table of Contents
Fetching ...

Meta Clustering of Neural Bandits

Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He

TL;DR

An instance-dependent performance guarantee is provided for the proposed algorithm that withstands the adversarial context, and it is proved the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions.

Abstract

The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.

Meta Clustering of Neural Bandits

TL;DR

An instance-dependent performance guarantee is provided for the proposed algorithm that withstands the adversarial context, and it is proved the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions.

Abstract

The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.
Paper Structure (13 sections, 23 theorems, 46 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 23 theorems, 46 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Theorem 5.1

Given the number of rounds $T$ and $\gamma$, for any $\delta \in (0, 1), R > 0$, suppose $m \geq \widetilde{\Omega} ( \text{poly}(T, L, R) \cdot Kn\log (1/\delta))$, $\eta_1 = \eta_2 = \frac{R^2}{\sqrt{m}}$, and $\mathbb{E}[|\mathcal{N}_{u_t}(\mathbf{x}_t)|] = \frac{n}{q}, t \in [T]$. Then, with p where $S^{\ast}_{TK} = \underset{ \theta \in B(\theta_0, R)}{\inf} \sum_{t=1}^{TK} \mathcal{L}_t(\

Figures (7)

  • Figure 1: Clustering and Meta Adaptation: Given $u_t$ and an arm $\mathbf{x}_{t,i}$, (1) M-CNB identifies cluster $\widehat{\mathcal{N}}_{u_t}(\mathbf{x}_{t,i})$, and then (2) meta-learner $\Theta_{t-1}$ rapidly adapt to this cluster, proceeding to (3) the UCB exploration.
  • Figure 2: Regret comparison on recommendation datasets.
  • Figure 3: Regret comparison on Mnist and Notmnist, Cifar10, EMNIST(Letter), and Shuttle.
  • Figure 4: Regret comparison on Mnist, Fashion-Mnist, Mushroom, and MagicTelescope.
  • Figure 5: Running time vs. Performance for all methods.
  • ...and 2 more figures

Theorems & Definitions (28)

  • Definition 3.1: Relative Cluster
  • Definition 3.2: $\gamma$-gap
  • Theorem 5.1
  • Definition 5.2: NTK ntk2018neuralwang2021neural
  • Lemma 5.3
  • Lemma A.1
  • Lemma A.2
  • Lemma A.3
  • Lemma A.4: Almost Convexity
  • Lemma A.5: User Trajectory Ball
  • ...and 18 more