Table of Contents
Fetching ...

Treatment response as a latent variable

Christopher Tosh, Boyuan Zhang, Wesley Tansey

TL;DR

The paper addresses causal inference when treatment effects are confined to a subset of individuals by introducing the causal two-groups (C2G) model, a latent-response extension of the classical two-groups framework. It develops two empirical Bayes procedures—a semi-parametric additive-errors model and a fully nonparametric model—each enabling responder identification while controlling the false discovery rate, and defines estimands CARE, ARE, and ERPF with interval results in the nonparametric setting. The methods are shown to achieve FDR control and near-optimal power in simulations, with a cancer immunotherapy case study demonstrating clinically relevant biomarker recovery. The work advances precision medicine by enabling accurate stratification of responders in observational data and provides practical algorithms (EM-based and density-based) and software for practitioners.

Abstract

Scientists often need to analyze the samples in a study that responded to treatment in order to refine their hypotheses and find potential causal drivers of response. Natural variation in outcomes makes teasing apart responders from non-responders a statistical inference problem. To handle latent responses, we introduce the causal two-groups (C2G) model, a causal extension of the classical two-groups model. The C2G model posits that treated samples may or may not experience an effect, according to some prior probability. We propose two empirical Bayes procedures for the causal two-groups model, one under semi-parametric conditions and another under fully nonparametric conditions. The semi-parametric model assumes additive treatment effects and is identifiable from observed data. The nonparametric model is unidentifiable, but we show it can still be used to test for response in each treated sample. We show empirically and theoretically that both methods for selecting responders control the false discovery rate at the target level with near-optimal power. We also propose two novel estimands of interest and provide a strategy for deriving estimand intervals in the unidentifiable nonparametric model. On a cancer immunotherapy dataset, the nonparametric C2G model recovers clinically-validated predictive biomarkers of both positive and negative outcomes. Code is available at https://github.com/tansey-lab/causal2groups.

Treatment response as a latent variable

TL;DR

The paper addresses causal inference when treatment effects are confined to a subset of individuals by introducing the causal two-groups (C2G) model, a latent-response extension of the classical two-groups framework. It develops two empirical Bayes procedures—a semi-parametric additive-errors model and a fully nonparametric model—each enabling responder identification while controlling the false discovery rate, and defines estimands CARE, ARE, and ERPF with interval results in the nonparametric setting. The methods are shown to achieve FDR control and near-optimal power in simulations, with a cancer immunotherapy case study demonstrating clinically relevant biomarker recovery. The work advances precision medicine by enabling accurate stratification of responders in observational data and provides practical algorithms (EM-based and density-based) and software for practitioners.

Abstract

Scientists often need to analyze the samples in a study that responded to treatment in order to refine their hypotheses and find potential causal drivers of response. Natural variation in outcomes makes teasing apart responders from non-responders a statistical inference problem. To handle latent responses, we introduce the causal two-groups (C2G) model, a causal extension of the classical two-groups model. The C2G model posits that treated samples may or may not experience an effect, according to some prior probability. We propose two empirical Bayes procedures for the causal two-groups model, one under semi-parametric conditions and another under fully nonparametric conditions. The semi-parametric model assumes additive treatment effects and is identifiable from observed data. The nonparametric model is unidentifiable, but we show it can still be used to test for response in each treated sample. We show empirically and theoretically that both methods for selecting responders control the false discovery rate at the target level with near-optimal power. We also propose two novel estimands of interest and provide a strategy for deriving estimand intervals in the unidentifiable nonparametric model. On a cancer immunotherapy dataset, the nonparametric C2G model recovers clinically-validated predictive biomarkers of both positive and negative outcomes. Code is available at https://github.com/tansey-lab/causal2groups.

Paper Structure

This paper contains 46 sections, 19 theorems, 79 equations, 10 figures, 5 tables.

Key Result

Proposition 1

The model in eqn:two_groups is unidentifiable without further assumptions, even when restricting $0 < \pi(x) < 1$ for all $x$. Moreover, it is unidentifiable even when $x$ is a constant and the distributions $f_0, f_1$ are restricted to being differentiable log-concave probability distributions.

Figures (10)

  • Figure 1: (a) The FDR regression model of scott:etal:2014:fdr-regression. (b) An observational dataset with observed noncompliance frangakis:rubin:2002:principal-stratification. (c) A randomized controlled trial with a compliance proxy boatman:etal:2017:compliance-error. (d) The general causal two-groups model. The treatment $T$, (unobserved) treatment effect indicator $H$, and the outcome $Y$ are all confounded by $X$. The causal two-groups model generalizes other scenarios such as those in (a) and (b).
  • Figure 2: Nonparametric C2G example. Left: An example of a non-responder density (black), a treatment density (red), an extremal responder density (blue), and the range of feasible responder densities (gradient). Right: Each feasible responder density has an associated prior probability and CARE value.
  • Figure 3: The four types of latent confounding in the causal two-groups model.
  • Figure 4: Case study survival data. Left: Kaplan-Meier fits to the survival data from the untreated and treated groups. Right: Histogram of Cox proportional hazard-transformed outcomes for the untreated and treated groups.
  • Figure 5: FDR curves on additive synthetic data.
  • ...and 5 more figures

Theorems & Definitions (29)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Theorem 4
  • Corollary 1
  • Proposition 3
  • Proposition 4
  • ...and 19 more