Table of Contents
Fetching ...

Continuous Treatment Effects with Surrogate Outcomes

Zhenghao Zeng, David Arbour, Avi Feller, Raghavendra Addanki, Ryan Rossi, Ritwik Sinha, Edward H. Kennedy

TL;DR

This paper tackles the problem of estimating continuous treatment effects when primary outcomes are partly missing by leveraging surrogate outcomes and unlabeled data in a semi-supervised, doubly robust framework. It derives an identifying characterisation and constructs a pseudo-outcome-based estimator that remain consistent if either the outcome model or the treatment-density models are correctly specified, while also achieving asymptotic normality under nonparametric smoothing. The authors prove oracle efficiency under mild bias conditions and quantify a variance reduction from incorporating surrogates and unlabeled data, supported by simulations and a Job Corps real-data application that reveal nonlinear dose-response behavior. The approach enables robust inference with flexible nuisance estimation (including machine learning) and broad applicability to dose-response estimation with missing primary outcomes. Practical impact lies in more efficient and principled use of surrogate information to recover causal dose-response relationships in settings with costly or incomplete outcomes.

Abstract

In many real-world causal inference applications, the primary outcomes (labels) are often partially missing, especially if they are expensive or difficult to collect. If the missingness depends on covariates (i.e., missingness is not completely at random), analyses based on fully observed samples alone may be biased. Incorporating surrogates, which are fully observed post-treatment variables related to the primary outcome, can improve estimation in this case. In this paper, we study the role of surrogates in estimating continuous treatment effects and propose a doubly robust method to efficiently incorporate surrogates in the analysis, which uses both labeled and unlabeled data and does not suffer from the above selection bias problem. Importantly, we establish the asymptotic normality of the proposed estimator and show possible improvements on the variance compared with methods that solely use labeled data. Extensive simulations show our methods enjoy appealing empirical performance.

Continuous Treatment Effects with Surrogate Outcomes

TL;DR

This paper tackles the problem of estimating continuous treatment effects when primary outcomes are partly missing by leveraging surrogate outcomes and unlabeled data in a semi-supervised, doubly robust framework. It derives an identifying characterisation and constructs a pseudo-outcome-based estimator that remain consistent if either the outcome model or the treatment-density models are correctly specified, while also achieving asymptotic normality under nonparametric smoothing. The authors prove oracle efficiency under mild bias conditions and quantify a variance reduction from incorporating surrogates and unlabeled data, supported by simulations and a Job Corps real-data application that reveal nonlinear dose-response behavior. The approach enables robust inference with flexible nuisance estimation (including machine learning) and broad applicability to dose-response estimation with missing primary outcomes. Practical impact lies in more efficient and principled use of surrogate information to recover causal dose-response relationships in settings with costly or incomplete outcomes.

Abstract

In many real-world causal inference applications, the primary outcomes (labels) are often partially missing, especially if they are expensive or difficult to collect. If the missingness depends on covariates (i.e., missingness is not completely at random), analyses based on fully observed samples alone may be biased. Incorporating surrogates, which are fully observed post-treatment variables related to the primary outcome, can improve estimation in this case. In this paper, we study the role of surrogates in estimating continuous treatment effects and propose a doubly robust method to efficiently incorporate surrogates in the analysis, which uses both labeled and unlabeled data and does not suffer from the above selection bias problem. Importantly, we establish the asymptotic normality of the proposed estimator and show possible improvements on the variance compared with methods that solely use labeled data. Extensive simulations show our methods enjoy appealing empirical performance.
Paper Structure (26 sections, 5 theorems, 91 equations, 7 figures, 1 algorithm)

This paper contains 26 sections, 5 theorems, 91 equations, 7 figures, 1 algorithm.

Key Result

Theorem 1

Under Assumption asm:consistency--asm:surrogates-positivity we have for fixed $a \in \mathcal{A}$, where the expectations are over $Y, \mathbf{S}, \mathbf{V}$ in eq:identification.

Figures (7)

  • Figure 1: Example of a causal graph with surrogate outcome $\mathbf{S}$.
  • Figure 2: Root mean square error Versus $\alpha$, where $n^{-\alpha}$ is the estimation error of the nuisance functions.
  • Figure 3: RMSE versus sample size (in log scale) when nuisance functions are estimated by parametric models.
  • Figure 4: RMSE versus sample size when nuisance functions are estimated by nonparametric models.
  • Figure 5: Root mean square error Versus $\alpha$, where $n^{-\alpha}$ is the estimation error of the nuisance functions.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Proposition 2
  • Theorem 3
  • proof
  • proof
  • proof
  • proof
  • proof