Table of Contents
Fetching ...

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

Shuangning Li, Chonghuan Wang, Jingyan Wang

TL;DR

The level of exploration versus exploitation as a key determinant of how data sharing impacts decision making is identified as a key determinant of how data sharing impacts decision making, and a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice is proposed.

Abstract

We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation systems, and hence the estimate for the global treatment effect (GTE) is biased. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. In this paper, we investigate when such interference may affect our decision making on which algorithm is better. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the difference-in-means estimator of the GTE under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how data sharing impacts decision making, and we propose a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice.

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

TL;DR

The level of exploration versus exploitation as a key determinant of how data sharing impacts decision making is identified as a key determinant of how data sharing impacts decision making, and a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice is proposed.

Abstract

We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation systems, and hence the estimate for the global treatment effect (GTE) is biased. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. In this paper, we investigate when such interference may affect our decision making on which algorithm is better. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the difference-in-means estimator of the GTE under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how data sharing impacts decision making, and we propose a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice.

Paper Structure

This paper contains 81 sections, 15 theorems, 137 equations, 8 figures, 4 algorithms.

Key Result

Proposition 1

Consider running an algorithm individually without data sharing. For any $0 \le \alpha \le 1$, the expected regret of the UCB algorithm $\texttt{UCB}_\alpha$ satisfies The expected regret of the $\epsilon$-greedy algorithm $\texttt{$\epsilon$-grd}_{\alpha, C}$ with $C\ge\max\left\{120K,\frac{16K}{\Delta_{\textrm{min}}^2}\right\}$ satisfies

Figures (8)

  • Figure 1: Shared feedback loops in data create interference between the treatment and control groups (left panel), compared to an idealized environment where the two groups operate in isolation (right panel), which is often infeasible in practice.
  • Figure 2: Our theoretical results contrast an algorithm's performance when running individually (left) versus under data sharing (right).
  • Figure 3: Total regret as a function of the traffic allocation parameter $\beta$.
  • Figure 4: Sign violation. The greedy algorithm runs jointly with the $\epsilon$-greedy or UCB algorithm.
  • Figure 5: Sign preservation. Red color means that algorithm 1 is worse than algorithm 2; blue color means that algorithm 1 is better than algorithm 2. (Best viewed in color.)
  • ...and 3 more figures

Theorems & Definitions (15)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Proposition 2
  • Theorem 7
  • Proposition 3
  • ...and 5 more