Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications
Luyue Xu, Liming Wang, Hong Xie, Mingqiang Zhou
TL;DR
This work addresses feedback bias due to herding in contextual linear bandits for online recommendations by modeling the feedback as $V_t(a)=\alpha h_{t,a}+(1-\alpha)\boldsymbol{\theta}^\top \boldsymbol{x}_a+\eta_{t,a}$ and introducing the TS-Conf algorithm. TS-Conf leverages posterior sampling to jointly infer the user parameter $\boldsymbol{\theta}$, the conformity level $\alpha$, and noise $\boldsymbol{\sigma}$, enabling balanced exploration and exploitation under biased feedback. The authors establish a sublinear regret bound and demonstrate empirically that TS-Conf outperforms baselines across synthetic and real-world datasets (Amazon Music, MovieLens, Yelp, Google Maps), effectively mitigating herding effects and accelerating learning. The results have practical implications for improving online recommendation systems when user feedback is confounded by historical ratings.
Abstract
Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
