Table of Contents
Fetching ...

On the Role of Surrogates in Conformal Inference of Individual Causal Effects

Chenyin Gao, Peter B. Gilbert, Larry Han

TL;DR

This work tackles uncertainty quantification for individualized treatment effects (ITEs) using conformal inference, which historically yields overly wide prediction intervals. It introduces SCIENCE, a surrogate-assisted conformal inference framework that leverages surrogate outcomes under covariate shift and semi-supervised settings to produce more efficient, valid prediction intervals for ITEs. By deriving semi-parametric efficiency bounds via efficient influence functions and establishing PAC-type coverage, SCIENCE demonstrates measurable interval-width reductions in simulations and real data (Moderna COVE) when surrogates are predictive. The approach enables reliable, individualized uncertainty quantification in precision medicine, with practical gains realized through surrogate markers and flexible nuisance-function estimation under mild regularity conditions.

Abstract

Learning the Individual Treatment Effect (ITE) is essential for personalized decision-making, yet causal inference has traditionally focused on aggregated treatment effects. While integrating conformal prediction with causal inference can provide valid uncertainty quantification for ITEs, the resulting prediction intervals are often excessively wide, limiting their practical utility. To address this limitation, we introduce \underline{S}urrogate-assisted \underline{C}onformal \underline{I}nference for \underline{E}fficient I\underline{N}dividual \underline{C}ausal \underline{E}ffects (SCIENCE), a framework designed to construct more efficient prediction intervals for ITEs. SCIENCE accommodates the covariate shifts between source data and target data and applies to various data configurations, including semi-supervised and surrogate-assisted semi-supervised learning. Leveraging semi-parametric efficiency theory, SCIENCE produces rate double-robust prediction intervals under mild rate convergence conditions, permitting the use of flexible non-parametric models to estimate nuisance functions. We quantify efficiency gains by comparing semi-parametric efficiency bounds with and without the surrogates. Simulation studies demonstrate that our surrogate-assisted intervals offer substantial efficiency improvements over existing methods while maintaining valid group-conditional coverage. Applied to the phase 3 Moderna COVE COVID-19 vaccine trial, SCIENCE illustrates how multiple surrogate markers can be leveraged to generate more efficient prediction intervals.

On the Role of Surrogates in Conformal Inference of Individual Causal Effects

TL;DR

This work tackles uncertainty quantification for individualized treatment effects (ITEs) using conformal inference, which historically yields overly wide prediction intervals. It introduces SCIENCE, a surrogate-assisted conformal inference framework that leverages surrogate outcomes under covariate shift and semi-supervised settings to produce more efficient, valid prediction intervals for ITEs. By deriving semi-parametric efficiency bounds via efficient influence functions and establishing PAC-type coverage, SCIENCE demonstrates measurable interval-width reductions in simulations and real data (Moderna COVE) when surrogates are predictive. The approach enables reliable, individualized uncertainty quantification in precision medicine, with practical gains realized through surrogate markers and flexible nuisance-function estimation under mild regularity conditions.

Abstract

Learning the Individual Treatment Effect (ITE) is essential for personalized decision-making, yet causal inference has traditionally focused on aggregated treatment effects. While integrating conformal prediction with causal inference can provide valid uncertainty quantification for ITEs, the resulting prediction intervals are often excessively wide, limiting their practical utility. To address this limitation, we introduce \underline{S}urrogate-assisted \underline{C}onformal \underline{I}nference for \underline{E}fficient I\underline{N}dividual \underline{C}ausal \underline{E}ffects (SCIENCE), a framework designed to construct more efficient prediction intervals for ITEs. SCIENCE accommodates the covariate shifts between source data and target data and applies to various data configurations, including semi-supervised and surrogate-assisted semi-supervised learning. Leveraging semi-parametric efficiency theory, SCIENCE produces rate double-robust prediction intervals under mild rate convergence conditions, permitting the use of flexible non-parametric models to estimate nuisance functions. We quantify efficiency gains by comparing semi-parametric efficiency bounds with and without the surrogates. Simulation studies demonstrate that our surrogate-assisted intervals offer substantial efficiency improvements over existing methods while maintaining valid group-conditional coverage. Applied to the phase 3 Moderna COVE COVID-19 vaccine trial, SCIENCE illustrates how multiple surrogate markers can be leveraged to generate more efficient prediction intervals.

Paper Structure

This paper contains 42 sections, 12 theorems, 78 equations, 8 figures, 8 tables.

Key Result

Lemma 1

Under Assumptions assum:estimand to assump:MAR, we have (a) $D\perp \{Y(a),S(a)\}\mid X$, (b) $A \perp \{Y(a), S(a)\} \mid X, D=0$, and (c) $A \perp \{Y(a), S(a)\} \mid X, D=1$ for $a = 0, 1$.

Figures (8)

  • Figure 1: Single-world intervention graph for the source and target data with surrogates. Here, the shaded nodes, confounder $U$ and primary outcome $Y(a)$ of the target data, are unobserved. The dashed lines, that the unobserved confounder $U$ could affect treatment $A$ and surrogates $S(a)$ of the source data, are permitted in athey2020combining but are not allowed for our problem setup.
  • Figure 2: Schematic illustration for implementation of split conformal inference
  • Figure 3: Schematic illustration for constructing the nested prediction intervals $C(W;r_{\alpha,C})$ for target data $D=0$
  • Figure 4: (A) Empirical coverage and width of the 95% prediction intervals, when $\sigma_S = 10$ and (B) average relative width of the $95\%$ prediction intervals ('SCIENCE' versus 'No Surr') when $\sigma_S = 1,5, 10, 30$ for ITE across 500 replicates.
  • Figure 5: Empirical coverage and width of the 95% prediction intervals for the ITE of the combined data when $\sigma_S = 10$, conditional on the group variable $G$, across 500 replicates.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Lemma 2
  • Theorem 3
  • Remark 1
  • Theorem 4
  • Corollary 2
  • Lemma A.1
  • ...and 3 more