Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

Shuangning Li; Chonghuan Wang; Jingyan Wang

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

Shuangning Li, Chonghuan Wang, Jingyan Wang

TL;DR

The level of exploration versus exploitation as a key determinant of how data sharing impacts decision making is identified as a key determinant of how data sharing impacts decision making, and a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice is proposed.

Abstract

We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation systems, and hence the estimate for the global treatment effect (GTE) is biased. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. In this paper, we investigate when such interference may affect our decision making on which algorithm is better. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the difference-in-means estimator of the GTE under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how data sharing impacts decision making, and we propose a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice.

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

TL;DR

Abstract

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (15)