Table of Contents
Fetching ...

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

Ahmed Sayeed Faruk, Elena Zheleva

TL;DR

The paper tackles maximizing network rewards in contextual bandits on social networks by incorporating heterogeneous spillovers and dynamic neighborhood context. It introduces NetCB, a two-component framework that augments CMAB with dynamic neighborhood features and a spillover-aware override mechanism, compatible with existing CMAB algorithms. Empirical results on real-world and semi-synthetic networks show meaningful reductions in regret and improvements in bandit accuracy, especially in highly homophilous networks, and demonstrate that sometimes suboptimal direct recommendations can boost overall network rewards through spillover. The work offers practical pathways for improving recommendations in networked settings and outlines future directions such as regret analysis and learning spillover probabilities.

Abstract

Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where $\textit{the action of one user can influence the actions and rewards of other users,}$ neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

TL;DR

The paper tackles maximizing network rewards in contextual bandits on social networks by incorporating heterogeneous spillovers and dynamic neighborhood context. It introduces NetCB, a two-component framework that augments CMAB with dynamic neighborhood features and a spillover-aware override mechanism, compatible with existing CMAB algorithms. Empirical results on real-world and semi-synthetic networks show meaningful reductions in regret and improvements in bandit accuracy, especially in highly homophilous networks, and demonstrate that sometimes suboptimal direct recommendations can boost overall network rewards through spillover. The work offers practical pathways for improving recommendations in networked settings and outlines future directions such as regret analysis and learning spillover probabilities.

Abstract

Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.
Paper Structure (26 sections, 11 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 11 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: The workflow of the $NetCB$ framework.
  • Figure 2: Recommendation-dependent heterogeneity in network spillover.
  • Figure 3: Comparison of cumulative bandit accuracy, $B_{acc}$, in real-world and semi-synthetic (marked in blue) datasets.
  • Figure 4: Comparison of cumulative bandit accuracy, $B_{acc}$ of $NetCB_{NeuralTS}$ by varying activation probabilities due to direct recommendations.
  • Figure :