Linear Contextual Bandits with Interference

Yang Xu; Wenbin Lu; Rui Song

Linear Contextual Bandits with Interference

Yang Xu, Wenbin Lu, Rui Song

TL;DR

This paper proposes a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties.

Abstract

Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.

Linear Contextual Bandits with Interference

TL;DR

Abstract

Paper Structure (33 sections, 6 theorems, 155 equations, 4 figures, 1 algorithm)

This paper contains 33 sections, 6 theorems, 155 equations, 4 figures, 1 algorithm.

Introduction
Related Work
Interference in Single Stage
Cooperative Multi-Agent Bandits
Problem Formulation
Offline Optimization
Online Algorithms
LinEGWI
LinUCBWI
LinTSWI
Theory
Tail bound of the online OLS estimator
The probability of exploration
Statistical Inference on beta
Statistical Inference on V
...and 18 more sections

Key Result

Theorem 4.1

(Tail Bound of the Online OLS Estimator) Suppose Assumptions assump:1-assump:2 hold. In either LinUCBWI, LinTSWI or LinEGWI, for any $h>0$, we have where $L_w$ and $L_x$ are some constants for boundedness, and $p_t$ controls the clipping rate in Algorithm algo:LinCBWI.

Figures (4)

Figure 1: The coverage plot of $\boldsymbol{\beta}$ (a. left) and $V^{\pi^*}$ (b. right)
Figure 2: Comparison of average regret in the presence of interference
Figure 3: Average rating comparison under reward generating model I (left) and II (right)
Figure 4: Comparison of average regret in the absence of interference

Theorems & Definitions (6)

Theorem 4.1
Theorem 4.2
Theorem 4.3
Theorem 4.4
Theorem 4.5
Lemma B.1

Linear Contextual Bandits with Interference

TL;DR

Abstract

Linear Contextual Bandits with Interference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)