Table of Contents
Fetching ...

Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population

Harsh Parikh, Rachael Ross, Elizabeth Stuart, Kara Rudolph

TL;DR

This work tackles the problem of extending causal inferences from randomized trials to target populations when underrepresented subgroups induce high variance and limited data support. It introduces ROOT, a nonparametric, tree-based framework that learns a binary weight function w(X) to minimize the variance of the weighted target average treatment effect (WTATE), while providing interpretable characterizations of underrepresented populations. The method is demonstrated through synthetic experiments and a MOUD case study (START trial transported to TEDSA), showing improved precision and interpretable descriptions of which subgroups are underrepresented. The paper discusses a two-stage design-analysis paradigm, data-adaptive estimands, positivity considerations, and potential asymptotic optimality of ROOT, emphasizing its practical value for refining target populations and guiding future trials in diverse populations.

Abstract

Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.

Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population

TL;DR

This work tackles the problem of extending causal inferences from randomized trials to target populations when underrepresented subgroups induce high variance and limited data support. It introduces ROOT, a nonparametric, tree-based framework that learns a binary weight function w(X) to minimize the variance of the weighted target average treatment effect (WTATE), while providing interpretable characterizations of underrepresented populations. The method is demonstrated through synthetic experiments and a MOUD case study (START trial transported to TEDSA), showing improved precision and interpretable descriptions of which subgroups are underrepresented. The paper discusses a two-stage design-analysis paradigm, data-adaptive estimands, positivity considerations, and potential asymptotic optimality of ROOT, emphasizing its practical value for refining target populations and guiding future trials in diverse populations.

Abstract

Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
Paper Structure (47 sections, 34 equations, 7 figures, 3 tables)

This paper contains 47 sections, 34 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Swarm-plot and violin-plot showing the distribution of the normalized selection odds $\ell(\mathbf{x})/\ell$ for the units in the trial sample ($S=1$) characterized as $w(\mathbf{x})=0$ (i.e. "underrepresented") and $w(\mathbf{x})=1$ using (a) selection odds with predefined threshold of $0.87$ on $\ell(\mathbf{x})/\ell$, (b) selection odds with the optimal threshold for $\ell(\mathbf{x})/\ell$ and (c) using ROOT.
  • Figure 2: Decision Tree characterizing the underrepresented population. The orange nodes and leaves indicate underrepresented subgroups that can be excluded from the TATE analysis for better precision.
  • Figure 3: (a) Distribution of conditional average treatment effects for units with $w(\mathbf{X}_i)=1$ (well represented) and $w(\mathbf{X}_i)=0$ (characterized underrepresented) via ROOT. (b) Tree explaining the distribution of conditional average treatment effects across various subgroups. Here, the yellow leaves indicate methadone is more effective compared to buprenorphine while the green leaves indicate buprenorphine is more effective than methadone in that subpopulation.
  • Figure 4: Scatter plot of Clone DGP relative feature importance for sample selection and treatment effect.
  • Figure 5: Tree characterizing indicating the study population in Blue and the underrepresented subgroups in Orange for all four DGPs
  • ...and 2 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2