Table of Contents
Fetching ...

Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates via Generalized Riesz Regression

Masahiro Kato

TL;DR

The paper tackles semisupervised causal inference by estimating the average treatment effect (ATE) when unlabeled covariates are available. It derives semiparametric efficiency bounds for both one-sample (censoring) and two-sample (case-control) data and constructs asymptotically efficient estimators based on Neyman orthogonal scores, with nuisance components estimated through generalized Riesz regression. Generalized Riesz regression enables end-to-end estimation of the Riesz representer using both labeled and unlabeled covariates, leading to variance reductions particularly in the (τ0(X) − τ0)^2 term and under covariate-shift settings. The authors analyze asymptotic properties under cross-fitting and extend the framework to the regime of infinitely many unlabeled data, linking to covariate shift adaptation and PU learning, thereby unifying several strands of semi-supervised and off-policy estimation. The practical impact is improved precision for ATE in settings where unlabeled covariates are abundant and labeling is costly.

Abstract

This study investigates treatment effect estimation in the semi-supervised setting, where we can use not only the standard triple of covariates, treatment indicator, and outcome, but also unlabeled auxiliary covariates. For this problem, we develop efficiency bounds and efficient estimators whose asymptotic variance aligns with the efficiency bound. In the analysis, we introduce two different data-generating processes: the one-sample setting and the two-sample setting. The one-sample setting considers the case where we can observe treatment indicators and outcomes for a part of the dataset, which is also called the censoring setting. In contrast, the two-sample setting considers two independent datasets with labeled and unlabeled data, which is also called the case-control setting or the stratified setting. In both settings, we find that by incorporating auxiliary covariates, we can lower the efficiency bound and obtain an estimator with an asymptotic variance smaller than that without such auxiliary covariates.

Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates via Generalized Riesz Regression

TL;DR

The paper tackles semisupervised causal inference by estimating the average treatment effect (ATE) when unlabeled covariates are available. It derives semiparametric efficiency bounds for both one-sample (censoring) and two-sample (case-control) data and constructs asymptotically efficient estimators based on Neyman orthogonal scores, with nuisance components estimated through generalized Riesz regression. Generalized Riesz regression enables end-to-end estimation of the Riesz representer using both labeled and unlabeled covariates, leading to variance reductions particularly in the (τ0(X) − τ0)^2 term and under covariate-shift settings. The authors analyze asymptotic properties under cross-fitting and extend the framework to the regime of infinitely many unlabeled data, linking to covariate shift adaptation and PU learning, thereby unifying several strands of semi-supervised and off-policy estimation. The practical impact is improved precision for ATE in settings where unlabeled covariates are abundant and labeling is costly.

Abstract

This study investigates treatment effect estimation in the semi-supervised setting, where we can use not only the standard triple of covariates, treatment indicator, and outcome, but also unlabeled auxiliary covariates. For this problem, we develop efficiency bounds and efficient estimators whose asymptotic variance aligns with the efficiency bound. In the analysis, we introduce two different data-generating processes: the one-sample setting and the two-sample setting. The one-sample setting considers the case where we can observe treatment indicators and outcomes for a part of the dataset, which is also called the censoring setting. In contrast, the two-sample setting considers two independent datasets with labeled and unlabeled data, which is also called the case-control setting or the stratified setting. In both settings, we find that by incorporating auxiliary covariates, we can lower the efficiency bound and obtain an estimator with an asymptotic variance smaller than that without such auxiliary covariates.

Paper Structure

This paper contains 49 sections, 14 theorems, 130 equations, 1 figure, 1 algorithm.

Key Result

Lemma 3.1

Suppose that Assumptions asm:os_eval_density--asm:os_commonsupprt hold. Then, the efficient influence function is given as where

Figures (1)

  • Figure 1: Illustration of the one-sample and two-sample scenarios.

Theorems & Definitions (22)

  • Remark : PU learning
  • Lemma 3.1
  • Proposition 3.2: Theorem 25.20 in Vaart1998asymptoticstatistics.
  • Theorem 3.3: Efficiency bound in the one-sample scenario
  • Theorem 3.4: Consistency in the one-sample setting
  • Theorem 3.5: Asymptotic normality in the one-sample scenario
  • Remark : Inefficiency of the Inverse Probability Weighting (IPW) estimator
  • Remark : Regression Adjustment (RA) estimator
  • Lemma 4.1
  • Theorem 4.2: Efficiency bound in the two-sample scenario
  • ...and 12 more