Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference
Zichen Wang, Haoyang Hong, Chuanhao Li, Haoxuan Li, Zhiheng Zhang, Huazheng Wang
TL;DR
This paper tackles the problem of balancing regret minimization with statistical inference for suboptimal arms in adversarial MABNI by formulating a unified MAB-N framework with exposure mapping and clustering. It proves a Pareto frontier linking regret and ATE estimation error, and introduces an anytime-valid asymptotic confidence sequence (CS) combined with the EXP3-N-CS algorithm to jointly optimize continual inference and learning efficiency. The algorithm integrates MAD exploration with an IPW-based CS to achieve a provable trade-off, yielding sublinear regret while maintaining reliable ATE estimates across time. Empirical results on networked settings demonstrate Pareto-optimal behavior and illustrate how the trade-off can be tuned via a design parameter, informing practical deployment in causal network experiments.
Abstract
In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, $\texttt{EXP3-N-CS}$, specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.
