Online Experimental Design With Estimation-Regret Trade-off Under Network Interference
Zhiheng Zhang, Zichen Wang
TL;DR
This work studies online experimental design under network interference by formulating a generalized MAB problem, MAB-N, that uses exposure mapping to compress the exponentially large joint action space into a tractable exposure space. It establishes a Pareto-optimal trade-off between estimation accuracy of exposure-based treatment effects and cumulative regret, and presents the UCB-TSN algorithm that achieves this balance in stochastic settings; it is extended to adversarial environments via EXP3-TSN with corresponding estimation and regret guarantees. The framework unifies offline causal inference with online learning under interference, clarifies when prior interference assumptions are necessary, and provides lower bounds and a Pareto frontier characterization that guide algorithm design. Empirical results on a 101-node network validate improved estimation-regret trade-offs relative to baselines and demonstrate robustness across network structures. The paper also outlines extensions to reinforcement learning, dynamic networks, and stronger inference under interference, establishing a foundation for scalable, interference-aware sequential decision-making.
Abstract
Network interference has attracted significant attention in the field of causal inference, encapsulating various sociological behaviors where the treatment assigned to one individual within a network may affect the outcomes of others, such as their neighbors. A key challenge in this setting is that standard causal inference methods often assume independent treatment effects among individuals, which may not hold in networked environments. To estimate interference-aware causal effects, a traditional approach is to inherit the independent settings, where practitioners randomly assign experimental participants into different groups and compare their outcomes. While effective in offline settings, this strategy becomes problematic in sequential experiments, where suboptimal decision persists, leading to substantial regret. To address this issue, we introduce a unified interference-aware framework for online experimental design. Compared to existing studies, we extend the definition of arm space by utilizing the statistical concept of exposure mapping, which allows for a more flexible and context-aware representation of treatment effects in networked settings. Crucially, we establish a Pareto-optimal trade-off between estimation accuracy and regret under the network concerning both time period and arm space, which remains superior to baseline models even without network interference. Furthermore, we propose an algorithmic implementation and discuss its generalization across different learning settings and network topology.
