Table of Contents
Fetching ...

Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference

Zichen Wang, Haoyang Hong, Chuanhao Li, Haoxuan Li, Zhiheng Zhang, Huazheng Wang

TL;DR

This paper tackles the problem of balancing regret minimization with statistical inference for suboptimal arms in adversarial MABNI by formulating a unified MAB-N framework with exposure mapping and clustering. It proves a Pareto frontier linking regret and ATE estimation error, and introduces an anytime-valid asymptotic confidence sequence (CS) combined with the EXP3-N-CS algorithm to jointly optimize continual inference and learning efficiency. The algorithm integrates MAD exploration with an IPW-based CS to achieve a provable trade-off, yielding sublinear regret while maintaining reliable ATE estimates across time. Empirical results on networked settings demonstrate Pareto-optimal behavior and illustrate how the trade-off can be tuned via a design parameter, informing practical deployment in causal network experiments.

Abstract

In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, $\texttt{EXP3-N-CS}$, specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.

Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference

TL;DR

This paper tackles the problem of balancing regret minimization with statistical inference for suboptimal arms in adversarial MABNI by formulating a unified MAB-N framework with exposure mapping and clustering. It proves a Pareto frontier linking regret and ATE estimation error, and introduces an anytime-valid asymptotic confidence sequence (CS) combined with the EXP3-N-CS algorithm to jointly optimize continual inference and learning efficiency. The algorithm integrates MAD exploration with an IPW-based CS to achieve a provable trade-off, yielding sublinear regret while maintaining reliable ATE estimates across time. Empirical results on networked settings demonstrate Pareto-optimal behavior and illustrate how the trade-off can be tuned via a design parameter, informing practical deployment in causal network experiments.

Abstract

In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, , specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.

Paper Structure

This paper contains 34 sections, 5 theorems, 48 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Theorem 4.1

Given any online decision-making policy $\pi$, and any $\mathcal{S}$ and $\mathcal{C}$ that satisfy Condition armspace, the trade-off between regret and ATE estimation exhibits where $\mathcal{R}_{\nu}$ and $e_{\nu}$ denote, respectively, the regret and the maximum ATE estimation error under instance $\nu$.

Figures (6)

  • Figure 1: The main contribution of our paper is to study how to achieve these three objectives and to analyze their underlying interrelationships.
  • Figure 2: Experimental results.
  • Figure 3: Experimental results of instance 1.
  • Figure 4: Experimental results of instance 2.
  • Figure 5: Experimental results of instance 3.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Definition 3.2: ATE
  • Theorem 4.1
  • Definition 5.1: Asymptotic ($1 - \tilde{\delta}$) CS
  • Proposition 5.2: Asymptotic CS for MAB-N
  • Definition 5.4: MAD
  • Theorem 5.5: Performance of the Asymptotic CS
  • Theorem 5.6
  • proof : Proof of Theorem \ref{['trade-off']}
  • Lemma D.1
  • proof : Proof of Lemma \ref{['lemmacondition']}
  • ...and 3 more