PUATE: Efficient Average Treatment Effect Estimation from Treated (Positive) and Unlabeled Units
Masahiro Kato, Fumiaki Kozai, Ryo Inokuchi
TL;DR
PUATE tackles average treatment effect estimation when treatment labels are partially observed, framing the problem as learning from positive and unlabeled data. It derives semiparametric efficiency bounds and efficient influence functions for two DGPs—the censoring and case-control settings—and constructs estimators that achieve these bounds, with doubly robust and cross-fitting properties to ensure $\sqrt{n}$-consistency and asymptotic normality. The work connects PU learning with causal inference under missing data, providing theoretically optimal estimators and practical guidance for settings like implicit feedback in recommender systems and incomplete treatment data in medicine. Overall, it delivers efficient, robust tools for ATE estimation when only treated and unlabeled units are available, expanding causal inference paradigms in weakly supervised contexts.
Abstract
The estimation of average treatment effects (ATEs), defined as the difference in expected outcomes between treatment and control groups, is a central topic in causal inference. This study develops semiparametric efficient estimators for ATE in a setting where only a treatment group and an unlabeled group, consisting of units whose treatment status is unknown, are observed. This scenario constitutes a variant of learning from positive and unlabeled data (PU learning) and can be viewed as a special case of ATE estimation with missing data. For this setting, we derive the semiparametric efficiency bounds, which characterize the lowest achievable asymptotic variance for regular estimators. We then construct semiparametric efficient ATE estimators that attain these bounds. Our results contribute to the literature on causal inference with missing data and weakly supervised learning.
