PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

Lokesh Nagalapatti; Pranava Singhal; Avishek Ghosh; Sunita Sarawagi

PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

Lokesh Nagalapatti, Pranava Singhal, Avishek Ghosh, Sunita Sarawagi

TL;DR

PairNet addresses the challenge of estimating individual treatment effects (ITE) from observational data by training on observed pairs with a pairwise factual loss, eliminating reliance on noisy pseudo-outcomes. Theoretical contributions establish consistency and ITE-risk bounds under overlap, showing the Pair loss upper-bounds the ITE risk via an IPM-based distance between neighbor distributions and observed covariates, with tighter guarantees than factual models. Empirically, PairNet delivers significant improvements over a wide range of baselines across binary and continuous treatments, is model-agnostic, and demonstrates robustness to pairing proximity and hyperparameters. These results suggest PairNet offers a practical, scalable pathway to more accurate individualized treatment effect estimation in real-world observational datasets.

Abstract

Given a dataset of individuals each described by a covariate vector, a treatment, and an observed outcome on the treatment, the goal of the individual treatment effect (ITE) estimation task is to predict outcome changes resulting from a change in treatment. A fundamental challenge is that in the observational data, a covariate's outcome is observed only under one treatment, whereas we need to infer the difference in outcomes under two different treatments. Several existing approaches address this issue through training with inferred pseudo-outcomes, but their success relies on the quality of these pseudo-outcomes. We propose PairNet, a novel ITE estimation training strategy that minimizes losses over pairs of examples based on their factual observed outcomes. Theoretical analysis for binary treatments reveals that PairNet is a consistent estimator of ITE risk, and achieves smaller generalization error than baseline models. Empirical comparison with thirteen existing methods across eight benchmarks, covering both discrete and continuous treatments, shows that PairNet achieves significantly lower ITE error compared to the baselines. Also, it is model-agnostic and easy to implement.

PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

TL;DR

Abstract

Paper Structure (41 sections, 4 theorems, 29 equations, 3 figures, 19 tables, 1 algorithm)

This paper contains 41 sections, 4 theorems, 29 equations, 3 figures, 19 tables, 1 algorithm.

Introduction
Problem Statement
Related Work
Training with pseudo-outcomes
Training without Pseudo-outcomes
The Pair Loss
ITE Risk Bounds for Binary Treatment
Comparison with bounds of existing methods
Empirical Evaluation
Experimental Setup
RQ1: PairNet vs. Baselines
Continuous Experiments:
RQ2: Smaller Sensitivity of PairNet to pair proximity
RQ3: Alignment of Pair loss with ITE Risk
RQ 4: Sensitivity of PairNet to $\delta_{\text{pair}}$ and $\text{num}_{z'}$
...and 26 more sections

Key Result

Lemma 5.6

The difference between ITE Risk and PairNet loss can be expressed as

Figures (3)

Figure 1: Motivating Experiment: Panel (a) presents observational data and predicted $\hat{\mu}$ functions, with pairs selected by PairNet indicated by black lines. In Panel (b), we visualize two empirical losses alongside the corresponding ITE risk. We observed a correlation of $0.32$ between factual loss and ITE risk, while PairNet achieved a substantially stronger correlation of $0.82$. Remarkably, the correlation dropped to $0.45$ when Pair loss lacked the residual alignment term, highlighting its importance.
Figure 2: We plot the distributions $p_t$ and $q_t$, where $p_0$ is $\mathcal{N}(-1, 1)$ and $p_1$ is $\mathcal{N}(+1, 1)$. We observe that the factual model relying on MMD($p_0, p_1$) shows a larger difference from the ITE risk compared to PairNet, which depends on MMD($p, q$).
Figure 3: RQ2: PEHE with increasing proximity of covariates within a pair. Matching methods like $k$NN deteriorate fast if pairs are not close together, whereas PairNet remains robust and provides gains over baseline also with random pairing ($\lambda=0$).

Theorems & Definitions (15)

Definition 5.1
Definition 5.2
Definition 5.3
Definition 5.4
Definition 5.5
Lemma 5.6
Definition 5.7
Theorem 5.9
Lemma 5.10: Consistency of PairNet
Theorem 5.11
...and 5 more

PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

TL;DR

Abstract

PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)