Table of Contents
Fetching ...

Neural Combinatorial Clustered Bandits for Recommendation Systems

Baran Atalar, Carlee Joe-Wong

TL;DR

The paper tackles contextual combinatorial bandits for recommender systems under semi-bandit feedback with unknown reward functions. It introduces NeUClust, which combines two neural networks (for base-arm and monotone super-arm rewards) with online clustering of contexts to guide super-arm selection without requiring an optimization oracle. Theoretical guarantees show a regret bound of $\widetilde{O}(\widetilde{d}\sqrt{T})$, where $\widetilde{d}$ is the effective dimension of the neural tangent kernel, and empirical results on MovieLens and Yelp validate substantial improvements over strong baselines. This approach enhances scalability and practicality for real-world recommendations by eliminating the need for an oracle while exploiting clustered structure in the context space.

Abstract

We consider the contextual combinatorial bandit setting where in each round, the learning agent, e.g., a recommender system, selects a subset of "arms," e.g., products, and observes rewards for both the individual base arms, which are a function of known features (called "context"), and the super arm (the subset of arms), which is a function of the base arm rewards. The agent's goal is to simultaneously learn the unknown reward functions and choose the highest-reward arms. For example, the "reward" may represent a user's probability of clicking on one of the recommended products. Conventional bandit models, however, employ restrictive reward function models in order to obtain performance guarantees. We make use of deep neural networks to estimate and learn the unknown reward functions and propose Neural UCB Clustering (NeUClust), which adopts a clustering approach to select the super arm in every round by exploiting underlying structure in the context space. Unlike prior neural bandit works, NeUClust uses a neural network to estimate the super arm reward and select the super arm, thus eliminating the need for a known optimization oracle. We non-trivially extend prior neural combinatorial bandit works to prove that NeUClust achieves $\widetilde{O}\left(\widetilde{d}\sqrt{T}\right)$ regret, where $\widetilde{d}$ is the effective dimension of a neural tangent kernel matrix, $T$ the number of rounds. Experiments on real world recommendation datasets show that NeUClust achieves better regret and reward than other contextual combinatorial and neural bandit algorithms.

Neural Combinatorial Clustered Bandits for Recommendation Systems

TL;DR

The paper tackles contextual combinatorial bandits for recommender systems under semi-bandit feedback with unknown reward functions. It introduces NeUClust, which combines two neural networks (for base-arm and monotone super-arm rewards) with online clustering of contexts to guide super-arm selection without requiring an optimization oracle. Theoretical guarantees show a regret bound of , where is the effective dimension of the neural tangent kernel, and empirical results on MovieLens and Yelp validate substantial improvements over strong baselines. This approach enhances scalability and practicality for real-world recommendations by eliminating the need for an oracle while exploiting clustered structure in the context space.

Abstract

We consider the contextual combinatorial bandit setting where in each round, the learning agent, e.g., a recommender system, selects a subset of "arms," e.g., products, and observes rewards for both the individual base arms, which are a function of known features (called "context"), and the super arm (the subset of arms), which is a function of the base arm rewards. The agent's goal is to simultaneously learn the unknown reward functions and choose the highest-reward arms. For example, the "reward" may represent a user's probability of clicking on one of the recommended products. Conventional bandit models, however, employ restrictive reward function models in order to obtain performance guarantees. We make use of deep neural networks to estimate and learn the unknown reward functions and propose Neural UCB Clustering (NeUClust), which adopts a clustering approach to select the super arm in every round by exploiting underlying structure in the context space. Unlike prior neural bandit works, NeUClust uses a neural network to estimate the super arm reward and select the super arm, thus eliminating the need for a known optimization oracle. We non-trivially extend prior neural combinatorial bandit works to prove that NeUClust achieves regret, where is the effective dimension of a neural tangent kernel matrix, the number of rounds. Experiments on real world recommendation datasets show that NeUClust achieves better regret and reward than other contextual combinatorial and neural bandit algorithms.

Paper Structure

This paper contains 21 sections, 14 theorems, 70 equations, 9 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

(Bound on base arm network output) With probability $(1 - O(L) \cdot e^{-\Omega(m\varepsilon^2/L)})(1 - e^{-\Omega(m\omega^\frac{2}{3} L)})(1-me^{(-m\rho^2/4)})$ the output of the first neural network for some $\beta > 0$, $\alpha > 0$ and $\omega \leq O(L^{-\frac{9}{2}}(\log m)^{-3})$.

Figures (9)

  • Figure 1: Our neural contextual combinatorial bandit formulation, with the online feedback loop for arm selection.
  • Figure 2: Super arm regret plot for the MovieLens dataset. Our NeUClust algorithm has lower regret than all baselines.
  • Figure 3: Super arm regret plot for the Yelp dataset. Our NeUClust algorithm has lower regret than all baselines.
  • Figure 4: Super arm reward vs rounds $(t)$ of NeUClust compared with other state-of-the art combinatorial and neural bandit algorithms for MovieLens
  • Figure 5: Super arm reward vs rounds $(t)$ of NeUClust compared with other state-of-the art combinatorial and neural bandit algorithms for Yelp
  • ...and 4 more figures

Theorems & Definitions (20)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • proof
  • proof
  • proof
  • Theorem 1
  • ...and 10 more