Leveraging heterogeneous spillover in maximizing contextual bandit rewards

Ahmed Sayeed Faruk; Elena Zheleva

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

Ahmed Sayeed Faruk, Elena Zheleva

TL;DR

The paper tackles maximizing network rewards in contextual bandits on social networks by incorporating heterogeneous spillovers and dynamic neighborhood context. It introduces NetCB, a two-component framework that augments CMAB with dynamic neighborhood features and a spillover-aware override mechanism, compatible with existing CMAB algorithms. Empirical results on real-world and semi-synthetic networks show meaningful reductions in regret and improvements in bandit accuracy, especially in highly homophilous networks, and demonstrate that sometimes suboptimal direct recommendations can boost overall network rewards through spillover. The work offers practical pathways for improving recommendations in networked settings and outlines future directions such as regret analysis and learning spillover probabilities.

Abstract

Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where $\textit{the action of one user can influence the actions and rewards of other users,}$ neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

TL;DR

Abstract

neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.

Paper Structure (26 sections, 11 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 11 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Related Work
Problem Description
Network Contextual Bandit framework
Dynamic neighborhood features per user
Spillover maximization
Spillover maximization
Illustration of NetCB algorithm
Experiments
Data representation
Evaluation metrics
Main algorithms and baselines
Experimental setup
Experimental results
Effect of dynamic neighborhood knowledge on $Regret$
...and 11 more sections

Figures (5)

Figure 1: The workflow of the $NetCB$ framework.
Figure 2: Recommendation-dependent heterogeneity in network spillover.
Figure 3: Comparison of cumulative bandit accuracy, $B_{acc}$, in real-world and semi-synthetic (marked in blue) datasets.
Figure 4: Comparison of cumulative bandit accuracy, $B_{acc}$ of $NetCB_{NeuralTS}$ by varying activation probabilities due to direct recommendations.
Figure :

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

TL;DR

Abstract

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (5)