Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates via Generalized Riesz Regression
Masahiro Kato
TL;DR
The paper tackles semisupervised causal inference by estimating the average treatment effect (ATE) when unlabeled covariates are available. It derives semiparametric efficiency bounds for both one-sample (censoring) and two-sample (case-control) data and constructs asymptotically efficient estimators based on Neyman orthogonal scores, with nuisance components estimated through generalized Riesz regression. Generalized Riesz regression enables end-to-end estimation of the Riesz representer using both labeled and unlabeled covariates, leading to variance reductions particularly in the (τ0(X) − τ0)^2 term and under covariate-shift settings. The authors analyze asymptotic properties under cross-fitting and extend the framework to the regime of infinitely many unlabeled data, linking to covariate shift adaptation and PU learning, thereby unifying several strands of semi-supervised and off-policy estimation. The practical impact is improved precision for ATE in settings where unlabeled covariates are abundant and labeling is costly.
Abstract
This study investigates treatment effect estimation in the semi-supervised setting, where we can use not only the standard triple of covariates, treatment indicator, and outcome, but also unlabeled auxiliary covariates. For this problem, we develop efficiency bounds and efficient estimators whose asymptotic variance aligns with the efficiency bound. In the analysis, we introduce two different data-generating processes: the one-sample setting and the two-sample setting. The one-sample setting considers the case where we can observe treatment indicators and outcomes for a part of the dataset, which is also called the censoring setting. In contrast, the two-sample setting considers two independent datasets with labeled and unlabeled data, which is also called the case-control setting or the stratified setting. In both settings, we find that by incorporating auxiliary covariates, we can lower the efficiency bound and obtain an estimator with an asymptotic variance smaller than that without such auxiliary covariates.
