Multi-Armed Bandits with Network Interference

Abhineet Agarwal; Anish Agarwal; Lorenzo Masoero; Justin Whitehouse

Multi-Armed Bandits with Network Interference

Abhineet Agarwal, Anish Agarwal, Lorenzo Masoero, Justin Whitehouse

TL;DR

The paper addresses online experimentation under cross-unit interference by formulating a multi-armed bandit problem with sparse network interference. It introduces a sparse Fourier representation of unit rewards, enabling simple linear regression-based algorithms: an explore-then-commit approach with OLS when the interference graph is known, and a Lasso-based extension when the graph is unknown. Theoretical results show sublinear regret: $\tilde{O}((\mathcal{A}^s T)^{2/3})$ for known interference, with a path to $\tilde{O}(\sqrt{N\mathcal{A}^s T})$ via sequential elimination, and $\tilde{O}(N^{1/3}(\mathcal{A}^s T)^{2/3})$ for unknown interference; simulations corroborate improvements over naive baselines. The framework generalizes prior work by allowing arbitrary and unknown neighbourhood interference and by leveraging discrete Fourier analysis to transform a high-dimensional problem into a sparse linear representation with unit-level feedback. This bridges online learning with Fourier-analytic techniques and offers practical avenues for scalable, interference-aware online experimentation.

Abstract

Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to competing goods. Statistical inference with interference is widely studied in the offline setting, but far less is known about how to adaptively assign treatments to minimize regret. We address this gap by studying a multi-armed bandit (MAB) problem where a learner (e-commerce platform) sequentially assigns one of possible $\mathcal{A}$ actions (discounts) to $N$ units (goods) over $T$ rounds to minimize regret (maximize revenue). Unlike traditional MAB problems, the reward of each unit depends on the treatments assigned to other units, i.e., there is interference across the underlying network of units. With $\mathcal{A}$ actions and $N$ units, minimizing regret is combinatorially difficult since the action space grows as $\mathcal{A}^N$. To overcome this issue, we study a sparse network interference model, where the reward of a unit is only affected by the treatments assigned to $s$ neighboring units. We use tools from discrete Fourier analysis to develop a sparse linear representation of the unit-specific reward $r_n: [\mathcal{A}]^N \rightarrow \mathbb{R} $, and propose simple, linear regression-based algorithms to minimize regret. Importantly, our algorithms achieve provably low regret both when the learner observes the interference neighborhood for all units and when it is unknown. This significantly generalizes other works on this topic which impose strict conditions on the strength of interference on a known network, and also compare regret to a markedly weaker optimal action. Empirically, we corroborate our theoretical findings via numerical simulations.

Multi-Armed Bandits with Network Interference

TL;DR

for known interference, with a path to

via sequential elimination, and

for unknown interference; simulations corroborate improvements over naive baselines. The framework generalizes prior work by allowing arbitrary and unknown neighbourhood interference and by leveraging discrete Fourier analysis to transform a high-dimensional problem into a sparse linear representation with unit-level feedback. This bridges online learning with Fourier-analytic techniques and offers practical avenues for scalable, interference-aware online experimentation.

Abstract

actions (discounts) to

units (goods) over

rounds to minimize regret (maximize revenue). Unlike traditional MAB problems, the reward of each unit depends on the treatments assigned to other units, i.e., there is interference across the underlying network of units. With

actions and

units, minimizing regret is combinatorially difficult since the action space grows as

. To overcome this issue, we study a sparse network interference model, where the reward of a unit is only affected by the treatments assigned to

neighboring units. We use tools from discrete Fourier analysis to develop a sparse linear representation of the unit-specific reward

, and propose simple, linear regression-based algorithms to minimize regret. Importantly, our algorithms achieve provably low regret both when the learner observes the interference neighborhood for all units and when it is unknown. This significantly generalizes other works on this topic which impose strict conditions on the strength of interference on a known network, and also compare regret to a markedly weaker optimal action. Empirically, we corroborate our theoretical findings via numerical simulations.

Paper Structure (24 sections, 9 theorems, 46 equations, 4 figures, 3 algorithms)

This paper contains 24 sections, 9 theorems, 46 equations, 4 figures, 3 algorithms.

Introduction
Contributions.
Related Work
Model & Background
Problem Set-up
Background on Discrete Fourier Analysis
Model: Sparse Network Interference
Network Multi-Armed Bandits with Known Interference
Determining exploration length $E$.
Regret Analysis
Comparison to other approaches.
Obtaining optimal regret ($\sqrt{T}$) dependence
Network Multi-Armed Bandits with Unknown Interference
Regret Analysis
Simulations
...and 9 more sections

Key Result

Proposition 3.1

Let Assumption ass:sparse_network_interference hold. Then, for any unit $n$, and action $\mathbf{a} \in [\mathcal{A}]^N$, we have the following representation of the reward $r_{n}(\mathbf{a}) = \langle \bm{\theta}_{n}, \bm{\chi}(\mathbf{a}) \rangle$, where $\left\lVert\bm{\theta}_n\right\rVert_0 \le

Figures (4)

Figure 1: A visual representation of sparse network interference. In this toy example, we have $N = 9$ units, and visualize the interference pattern. For unit $2$ (orange), its outcomes are affected by the treatments of its neighbours (blue) $\mathcal{N}(2) = \{1,2,3,6,7\}$.
Figure : (a) Cumulative regret vs number of units $N$.
Figure : (a) Cumulative regret vs number of units $N$.
Figure : (b) Cumulative regret scaling vs horizon $T$.

Theorems & Definitions (15)

Proposition 3.1
Theorem 4.1
Theorem 5.1
Lemma B.1: Theorem 5.41 in vershynin2018high
Lemma B.2: Minimum Eigenvalue of Fourier Characteristics
proof
Lemma B.3
proof
Definition C.1
Lemma C.2: Incoherence of Fourier Characteristics
...and 5 more

Multi-Armed Bandits with Network Interference

TL;DR

Abstract

Multi-Armed Bandits with Network Interference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)