Treatment response as a latent variable

Christopher Tosh; Boyuan Zhang; Wesley Tansey

Treatment response as a latent variable

Christopher Tosh, Boyuan Zhang, Wesley Tansey

TL;DR

The paper addresses causal inference when treatment effects are confined to a subset of individuals by introducing the causal two-groups (C2G) model, a latent-response extension of the classical two-groups framework. It develops two empirical Bayes procedures—a semi-parametric additive-errors model and a fully nonparametric model—each enabling responder identification while controlling the false discovery rate, and defines estimands CARE, ARE, and ERPF with interval results in the nonparametric setting. The methods are shown to achieve FDR control and near-optimal power in simulations, with a cancer immunotherapy case study demonstrating clinically relevant biomarker recovery. The work advances precision medicine by enabling accurate stratification of responders in observational data and provides practical algorithms (EM-based and density-based) and software for practitioners.

Abstract

Scientists often need to analyze the samples in a study that responded to treatment in order to refine their hypotheses and find potential causal drivers of response. Natural variation in outcomes makes teasing apart responders from non-responders a statistical inference problem. To handle latent responses, we introduce the causal two-groups (C2G) model, a causal extension of the classical two-groups model. The C2G model posits that treated samples may or may not experience an effect, according to some prior probability. We propose two empirical Bayes procedures for the causal two-groups model, one under semi-parametric conditions and another under fully nonparametric conditions. The semi-parametric model assumes additive treatment effects and is identifiable from observed data. The nonparametric model is unidentifiable, but we show it can still be used to test for response in each treated sample. We show empirically and theoretically that both methods for selecting responders control the false discovery rate at the target level with near-optimal power. We also propose two novel estimands of interest and provide a strategy for deriving estimand intervals in the unidentifiable nonparametric model. On a cancer immunotherapy dataset, the nonparametric C2G model recovers clinically-validated predictive biomarkers of both positive and negative outcomes. Code is available at https://github.com/tansey-lab/causal2groups.

Treatment response as a latent variable

TL;DR

Abstract

Treatment response as a latent variable

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (29)