Table of Contents
Fetching ...

Online Experimental Design With Estimation-Regret Trade-off Under Network Interference

Zhiheng Zhang, Zichen Wang

TL;DR

This work studies online experimental design under network interference by formulating a generalized MAB problem, MAB-N, that uses exposure mapping to compress the exponentially large joint action space into a tractable exposure space. It establishes a Pareto-optimal trade-off between estimation accuracy of exposure-based treatment effects and cumulative regret, and presents the UCB-TSN algorithm that achieves this balance in stochastic settings; it is extended to adversarial environments via EXP3-TSN with corresponding estimation and regret guarantees. The framework unifies offline causal inference with online learning under interference, clarifies when prior interference assumptions are necessary, and provides lower bounds and a Pareto frontier characterization that guide algorithm design. Empirical results on a 101-node network validate improved estimation-regret trade-offs relative to baselines and demonstrate robustness across network structures. The paper also outlines extensions to reinforcement learning, dynamic networks, and stronger inference under interference, establishing a foundation for scalable, interference-aware sequential decision-making.

Abstract

Network interference has attracted significant attention in the field of causal inference, encapsulating various sociological behaviors where the treatment assigned to one individual within a network may affect the outcomes of others, such as their neighbors. A key challenge in this setting is that standard causal inference methods often assume independent treatment effects among individuals, which may not hold in networked environments. To estimate interference-aware causal effects, a traditional approach is to inherit the independent settings, where practitioners randomly assign experimental participants into different groups and compare their outcomes. While effective in offline settings, this strategy becomes problematic in sequential experiments, where suboptimal decision persists, leading to substantial regret. To address this issue, we introduce a unified interference-aware framework for online experimental design. Compared to existing studies, we extend the definition of arm space by utilizing the statistical concept of exposure mapping, which allows for a more flexible and context-aware representation of treatment effects in networked settings. Crucially, we establish a Pareto-optimal trade-off between estimation accuracy and regret under the network concerning both time period and arm space, which remains superior to baseline models even without network interference. Furthermore, we propose an algorithmic implementation and discuss its generalization across different learning settings and network topology.

Online Experimental Design With Estimation-Regret Trade-off Under Network Interference

TL;DR

This work studies online experimental design under network interference by formulating a generalized MAB problem, MAB-N, that uses exposure mapping to compress the exponentially large joint action space into a tractable exposure space. It establishes a Pareto-optimal trade-off between estimation accuracy of exposure-based treatment effects and cumulative regret, and presents the UCB-TSN algorithm that achieves this balance in stochastic settings; it is extended to adversarial environments via EXP3-TSN with corresponding estimation and regret guarantees. The framework unifies offline causal inference with online learning under interference, clarifies when prior interference assumptions are necessary, and provides lower bounds and a Pareto frontier characterization that guide algorithm design. Empirical results on a 101-node network validate improved estimation-regret trade-offs relative to baselines and demonstrate robustness across network structures. The paper also outlines extensions to reinforcement learning, dynamic networks, and stronger inference under interference, establishing a foundation for scalable, interference-aware sequential decision-making.

Abstract

Network interference has attracted significant attention in the field of causal inference, encapsulating various sociological behaviors where the treatment assigned to one individual within a network may affect the outcomes of others, such as their neighbors. A key challenge in this setting is that standard causal inference methods often assume independent treatment effects among individuals, which may not hold in networked environments. To estimate interference-aware causal effects, a traditional approach is to inherit the independent settings, where practitioners randomly assign experimental participants into different groups and compare their outcomes. While effective in offline settings, this strategy becomes problematic in sequential experiments, where suboptimal decision persists, leading to substantial regret. To address this issue, we introduce a unified interference-aware framework for online experimental design. Compared to existing studies, we extend the definition of arm space by utilizing the statistical concept of exposure mapping, which allows for a more flexible and context-aware representation of treatment effects in networked settings. Crucially, we establish a Pareto-optimal trade-off between estimation accuracy and regret under the network concerning both time period and arm space, which remains superior to baseline models even without network interference. Furthermore, we propose an algorithmic implementation and discuss its generalization across different learning settings and network topology.

Paper Structure

This paper contains 45 sections, 14 theorems, 76 equations, 3 figures, 2 tables.

Key Result

Proposition 1

Given a priori $N, K, \mathbb{H}$. For any policy $\pi$, there exists a hard instance $\nu \in \mathcal{E}_0$ such that ${\mathcal{R}}^{\text{naive}}_\nu(T, \pi) = \Omega ( \frac{1}{\sqrt{N}} (T \wedge \sqrt{K^NT}))$.

Figures (3)

  • Figure 1: Pareto-optimality. (a) We use three blue fronts (first quadrant) to show three different MAB algorithms $\{\pi_i, \hat{\Delta}_i\}_{i\in [3]}$, e.g., the blue regions represent the regrets and estimation errors that can be realistically achieved in all kinds of instances given $\{\pi_1, \hat{\Delta}_1\}$. MAB algorithm is Pareto-optimal if and only if its blue front is tangent to the Pareto Frontier (red) (otherwise, it is intersecting with the grey region). (b) The green line represents the baseline in simchi2023multi, which loses the Pareo-optimality concerning arm space.
  • Figure 2: Network structure.
  • Figure 3: Experimental results.

Theorems & Definitions (17)

  • Proposition 1
  • Definition 2: Front and Pareto-dominate
  • Definition 3: Pareto-optimal and Pareto Frontier
  • Theorem 4
  • Theorem 5
  • Theorem 6: ATE estimation upper bound
  • Theorem 7: Regret upper bound
  • Corollary 8: Trade-off result
  • Theorem 9: Pareto-optimality trade-off in the adversarial setting
  • Remark 10
  • ...and 7 more