Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

Pengfei Shi; Wei Zhong; Xinyu Zhang; Ningtao Wang; Xing Fu; Weiqiang Wang; Yin Jin

Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

Pengfei Shi, Wei Zhong, Xinyu Zhang, Ningtao Wang, Xing Fu, Weiqiang Wang, Yin Jin

TL;DR

This paper tackles CATE estimation under unconfoundedness with high-dimensional covariates by introducing CrossNet, a neural architecture that learns a sufficient representation $\Phi(x)$ and trains two heads for the treated and control groups. The core idea is to enforce sufficiency through a discrepancy-regularized objective that aligns the conditional distributions of potential outcomes across treatment groups, while allowing cross-utilization of data to improve counterfactual predictions. The authors formalize sufficiency mathematically and prove that solving the population objective yields a representation that preserves unconfoundedness, and they demonstrate strong empirical gains on synthetic, IHDP, and Jobs datasets. The approach reduces selection bias, enables use of all available data during training, and improves CATE estimates and policy outcomes in diverse settings, with practical implications for precision medicine, policy evaluation, and targeted interventions.

Abstract

Estimating the conditional average treatment effects (CATE) is very important in causal inference and has a wide range of applications across many fields. In the estimation process of CATE, the unconfoundedness assumption is typically required to ensure the identifiability of the regression problems. When estimating CATE using high-dimensional data, there have been many variable selection methods and neural network approaches based on representation learning, while these methods do not provide a way to verify whether the subset of variables after dimensionality reduction or the learned representations still satisfy the unconfoundedness assumption during the estimation process, which can lead to ineffective estimates of the treatment effects. Additionally, these methods typically use data from only the treatment or control group when estimating the regression functions for each group. This paper proposes a novel neural network approach named \textbf{CrossNet} to learn a sufficient representation for the features, based on which we then estimate the CATE, where cross indicates that in estimating the regression functions, we used data from their own group as well as cross-utilized data from another group. Numerical simulations and empirical results demonstrate that our method outperforms the competitive approaches.

Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

TL;DR

This paper tackles CATE estimation under unconfoundedness with high-dimensional covariates by introducing CrossNet, a neural architecture that learns a sufficient representation

and trains two heads for the treated and control groups. The core idea is to enforce sufficiency through a discrepancy-regularized objective that aligns the conditional distributions of potential outcomes across treatment groups, while allowing cross-utilization of data to improve counterfactual predictions. The authors formalize sufficiency mathematically and prove that solving the population objective yields a representation that preserves unconfoundedness, and they demonstrate strong empirical gains on synthetic, IHDP, and Jobs datasets. The approach reduces selection bias, enables use of all available data during training, and improves CATE estimates and policy outcomes in diverse settings, with practical implications for precision medicine, policy evaluation, and targeted interventions.

Abstract

Paper Structure (11 sections, 1 theorem, 7 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 1 theorem, 7 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Method
Crossnet
Sufficiency
Experiments
Baseline
Synthetic Dataset
Semi-synthetic Dataset: IHDP
Real Dataset: Jobs
Conclusion

Key Result

Theorem 1

Let $\Phi:\mathcal{X}\rightarrow\mathcal{R}$ be a representation function and there exists a sufficient representation in $\mathcal{R}$. Let $h_1(\cdot)$ and $h_0(\cdot)$ be two hypothesis functions that maps $\mathcal{R}$ to $\mathcal{Y}$. If $(\Phi^0(\cdot)$, $h_1^0(\cdot)$, $h_0^0(\cdot))$ is a s Then $\Phi^0(x)$ is a sufficient representation of $x$ for estimating CATE.

Figures (2)

Figure 1: $\Phi$ is a learned representation. $h_1$ and $h_0$ are two hypothesis or predictive functions for $y$ based on $\Phi$ in treated and control group respectively. $l(\cdot)$ is a loss function, where we use mean squared error loss. $\hbox{Disc}(\cdot)$ means the discrepancy. $x^1,y^1$ and $x^0,y^0$ are the covariates and responses corresponding to treated group sample and control group sample respectively.
Figure 2: PEHE of all approaches for two simulation settings at different sample size.

Theorems & Definitions (2)

Definition 1: Sufficient representation
Theorem 1

Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

TL;DR

Abstract

Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)