Individualized Policy Evaluation and Learning under Clustered Network Interference

Yi Zhang; Kosuke Imai

Individualized Policy Evaluation and Learning under Clustered Network Interference

Yi Zhang, Kosuke Imai

TL;DR

The paper addresses evaluating and learning individualized treatment rules under clustered network interference (partial interference) by adopting a semiparametric additive outcome model that allows heterogeneous spillovers within clusters. It introduces the additive IPW (addIPW) estimator and a doubly robust extension (addDR) that achieve greater efficiency than standard IPW, with finite-sample regret bounds matching the i.i.d. rate. Policy learning is formulated as an optimization problem solvable via mixed-integer programming for key policy classes, and the framework is extended to observational data with cross-fitting and nuisance-function estimation. Empirical results from simulations and a Colombian cash-transfer application illustrate substantial improvements in policy value and robustness, highlighting the practical implications for network-aware personalized interventions.

Abstract

Although there is now a large literature on policy evaluation and learning, much of the prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference can lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network interference (also known as partial interference), where clusters of units are sampled from a population and units may influence one another within each cluster. Unlike previous methods that impose strong restrictions on spillover effects, such as anonymous interference, the proposed methodology only assumes a semiparametric structural model, where each unit's outcome is an additive function of individual treatments within the cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. We consider both experimental and observational studies, and for the latter, we develop a doubly robust estimator that is semiparametrically efficient and yields an optimal regret bound. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.

Individualized Policy Evaluation and Learning under Clustered Network Interference

TL;DR

Abstract

Paper Structure (35 sections, 6 theorems, 100 equations, 2 figures, 2 tables)

This paper contains 35 sections, 6 theorems, 100 equations, 2 figures, 2 tables.

Introduction
Related work.
The Problem Statement
Setup and Notation
Individualized Policy Learning Problem
Policy Evaluation
The Inverse-probability-weighting (IPW) Estimator
A Semiparametric Additive Outcome Model
Identification and Estimation
Semiparametric Model with Interactions
Policy Learning
Regret Analysis
Mixed-Integer Program Formulation
Extension to Observational Studies
Doubly Robust Estimator
...and 20 more sections

Key Result

Proposition 1

Under Assumptions ass:iidcluster, ass:DGP(a), ass:additive, and ass:factorCPS, and for $\forall\;\pi\in\Pi$,

Figures (2)

Figure 1: Boxplots of the policy value under the learned policies based on the proposed addIPW (red) and addDR (orange) estimators, in comparison to the standard IPW estimators with unknown interference (blue), anonymous interference (brown), and no interference (green). The value of the oracle estimator (purple) is also shown. Even when the model is misspecified (Scenario B), the proposed policy learning methods outperform the IPW estimators, with its estimated policy values closest to those of the oracle estimator. The addDR estimator further improves upon the addIPW estimator by reducing variance.
Figure 2: The performance of policy evaluation methodology based on the proposed addIPW (red) and addDR (orange) estimators, as well as the existing IPW estimators with unknown interference (blue) and anonymous interference (brown). The dots represent the average performance over simulations while the lines indicate the one standard deviation above and below the mean. The true policy value (dashed black line) is calculated based on Monte-Carlo simulations.

Theorems & Definitions (21)

Example 1: Linear-in-means model
Example 2: Additive nonparametric effect model under anonymous interference
Proposition 1: Unbiasedness
proof
Theorem 1: Finite-sample regret bound
proof
Theorem 2: Semiparametric Efficiency
proof
Theorem 3: Regret bound with doubly robust estimator
proof
...and 11 more

Individualized Policy Evaluation and Learning under Clustered Network Interference

TL;DR

Abstract

Individualized Policy Evaluation and Learning under Clustered Network Interference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (21)