Table of Contents
Fetching ...

Linear Contextual Bandits with Interference

Yang Xu, Wenbin Lu, Rui Song

TL;DR

This paper proposes a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties.

Abstract

Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.

Linear Contextual Bandits with Interference

TL;DR

This paper proposes a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties.

Abstract

Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.
Paper Structure (33 sections, 6 theorems, 155 equations, 4 figures, 1 algorithm)

This paper contains 33 sections, 6 theorems, 155 equations, 4 figures, 1 algorithm.

Key Result

Theorem 4.1

(Tail Bound of the Online OLS Estimator) Suppose Assumptions assump:1-assump:2 hold. In either LinUCBWI, LinTSWI or LinEGWI, for any $h>0$, we have where $L_w$ and $L_x$ are some constants for boundedness, and $p_t$ controls the clipping rate in Algorithm algo:LinCBWI.

Figures (4)

  • Figure 1: The coverage plot of $\boldsymbol{\beta}$ (a. left) and $V^{\pi^*}$ (b. right)
  • Figure 2: Comparison of average regret in the presence of interference
  • Figure 3: Average rating comparison under reward generating model I (left) and II (right)
  • Figure 4: Comparison of average regret in the absence of interference

Theorems & Definitions (6)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Lemma B.1