Table of Contents
Fetching ...

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

Luyue Xu, Liming Wang, Hong Xie, Mingqiang Zhou

TL;DR

This work addresses feedback bias due to herding in contextual linear bandits for online recommendations by modeling the feedback as $V_t(a)=\alpha h_{t,a}+(1-\alpha)\boldsymbol{\theta}^\top \boldsymbol{x}_a+\eta_{t,a}$ and introducing the TS-Conf algorithm. TS-Conf leverages posterior sampling to jointly infer the user parameter $\boldsymbol{\theta}$, the conformity level $\alpha$, and noise $\boldsymbol{\sigma}$, enabling balanced exploration and exploitation under biased feedback. The authors establish a sublinear regret bound and demonstrate empirically that TS-Conf outperforms baselines across synthetic and real-world datasets (Amazon Music, MovieLens, Yelp, Google Maps), effectively mitigating herding effects and accelerating learning. The results have practical implications for improving online recommendation systems when user feedback is confounded by historical ratings.

Abstract

Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

TL;DR

This work addresses feedback bias due to herding in contextual linear bandits for online recommendations by modeling the feedback as and introducing the TS-Conf algorithm. TS-Conf leverages posterior sampling to jointly infer the user parameter , the conformity level , and noise , enabling balanced exploration and exploitation under biased feedback. The authors establish a sublinear regret bound and demonstrate empirically that TS-Conf outperforms baselines across synthetic and real-world datasets (Amazon Music, MovieLens, Yelp, Google Maps), effectively mitigating herding effects and accelerating learning. The results have practical implications for improving online recommendation systems when user feedback is confounded by historical ratings.

Abstract

Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
Paper Structure (14 sections, 2 theorems, 11 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 14 sections, 2 theorems, 11 equations, 8 figures, 1 table, 2 algorithms.

Key Result

theorem 1

The regret of Algorithm 1 satisfies:

Figures (8)

  • Figure 1: Comparative analysis of TS-Conf and TS-ConfMCMC
  • Figure 2: Impact of dimensions $d$ in synthetic dataset
  • Figure 3: Impact of noise variance $\sigma^2$ in synthetic dataset
  • Figure 4: Impact of dimensions $d$ and noise variance $\sigma^2$ in MovieLens dataset.
  • Figure 5: Impact of dimensions $d$ and noise variance $\sigma^2$ in Yelp dataset.
  • ...and 3 more figures

Theorems & Definitions (2)

  • theorem 1
  • theorem 2