Table of Contents
Fetching ...

Extrapolating Single-Treatment Effects Out of Factorial Experiments

Guilherme Duarte

Abstract

Despite their cost, randomized controlled trials (RCTs) are widely regarded as gold-standard evidence in disciplines ranging from social science to medicine. In recent decades, researchers have increasingly sought to reduce the resource burden of repeated RCTs with factorial designs that simultaneously test multiple hypotheses, e.g. experiments that evaluate the effects of many medications or products simultaneously. Here I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions, even if otherwise perfectly realistic conditions are achieved. This happens because single-treatment effects involve a counterfactual world with a single focal intervention, allowing other variables to take their natural values (which may be confounded or modified by the focal intervention). In contrast, observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions, respectively. In this paper, I formalize sufficient conditions for the identifiability of those isolated quantities. I show that researchers who rely on this type of design have to justify either linearity of functional forms or -- in the nonparametric case -- specify with Directed Acyclic Graphs how variables are related in the real world. Finally, I develop nonparametric sharp bounds -- i.e., maximally informative best-/worst-case estimates consistent with limited RCT data -- that show when extrapolations about effect signs are empirically justified. These new results are illustrated with simulated data.

Extrapolating Single-Treatment Effects Out of Factorial Experiments

Abstract

Despite their cost, randomized controlled trials (RCTs) are widely regarded as gold-standard evidence in disciplines ranging from social science to medicine. In recent decades, researchers have increasingly sought to reduce the resource burden of repeated RCTs with factorial designs that simultaneously test multiple hypotheses, e.g. experiments that evaluate the effects of many medications or products simultaneously. Here I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions, even if otherwise perfectly realistic conditions are achieved. This happens because single-treatment effects involve a counterfactual world with a single focal intervention, allowing other variables to take their natural values (which may be confounded or modified by the focal intervention). In contrast, observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions, respectively. In this paper, I formalize sufficient conditions for the identifiability of those isolated quantities. I show that researchers who rely on this type of design have to justify either linearity of functional forms or -- in the nonparametric case -- specify with Directed Acyclic Graphs how variables are related in the real world. Finally, I develop nonparametric sharp bounds -- i.e., maximally informative best-/worst-case estimates consistent with limited RCT data -- that show when extrapolations about effect signs are empirically justified. These new results are illustrated with simulated data.
Paper Structure (16 sections, 14 theorems, 35 equations, 6 figures, 5 tables)

This paper contains 16 sections, 14 theorems, 35 equations, 6 figures, 5 tables.

Key Result

Theorem 2.1

Consider treatments $A$ and $B$, along with an outcome $Y$. Assume that both $A$ and $B$ independently cause $Y$, without causing each other. Additionally, suppose the effects of both $A$ and $B$ on $Y$ are monotonic, meaning $\Pr(Y_{a_0}=y_1, Y_{a_1}=y_0) = \Pr(Y_{b_0}=y_1, Y_{b_1}=y_0) = 0$. Under

Figures (6)

  • Figure 1: Study of individuals affected by COVID-19 in a hospital. The two binary treatments $A$ and $B$ represent respectively the drug A and admission to the Intensive Care Unit (ICU). All the possibilities of treatment and control (interventions) are enumerated. In column (a), a 2x2 factorial experiment, there are four possibilities of interventions at the same time on both $A$ and $B$. In column (b), there are four cases of single-treatment interventions on $A$ or $B$, which constitute a single-armed RCT. Finally, in column (c), there is only one case when there is no active intervention. This is the case of observational data.
  • Figure 2: Graphs representing experimental/observational settings of a study trying to assess the effect of drug A ($A$) on COVID-19 survival ($Y$). $B$ denotes admission or not to the ICU. In the graphs, direct arrows denote causality direction. $U_{AY}$ and $U_{BY}$ are potential unobserved variables, and for that reason, we employed dashed arrows. Any direct effect of $A$ on $B$ or vice-versa is not considered here (no mediation). Variables in boxes denote interventions, for example, fixing the value $a$ to the variable $A$ by an intervention. When an intervention happens, any arrow points to those intervened variables are immediately truncated. I consider three cases here. In case (a), representing a factorial experiment, there are two interventions on $B$ and $Y$, so confounding between $A$ and $Y$ and $B$ and $Y$ are removed. In case (b), a single-treatment RCT, there is only one intervention setting $A$ to a. Finally, case (c) represents an observational setting, and both $A$ and $B$ are confounded with $Y$. Without major assumptions, extrapolations using data from one case to answer questions about other cases are not supported.
  • Figure 3: A nonparametric structural graph illustrating a scenario where both $A$ and $B$ cause $Y$. All three variables, $A$, $B$, and $Y$, are influenced by unobserved confounders simultaneously. With the exception of cases where structural assumptions are explicitly relaxed, all the results presented in the paper assume this graph.
  • Figure 4: Cases where $\Pr(Y_a)$ is point identifiable given $\Pr(Y_{a,b})$ and $\Pr(Y,A,B)$.
  • Figure 5: Sensitivity analysis of the ATE, constrained by the proportion of non-interactive units, is presented. The details of the model are introduced in the Supplementary Material (Example 2). The dashed line represents the actual ATE of the model (0.58), while the black lines depict strict bounds corresponding to varying constraints on the maximum proportion of non-interactive units (x-axis). A maximum constraint of 0 implies no interaction, resulting in exact point identification of the ATE. As this maximum constraint increases, the bounds extend to the maximum limits reported in Theorems \ref{['theorem:boundsmain']} and \ref{['theorem:boundsonlyfact']}.
  • ...and 1 more figures

Theorems & Definitions (24)

  • Theorem 2.1
  • proof
  • Corollary 2.1.1: F-bias
  • Theorem 3.1
  • Theorem 3.2
  • Theorem B.1
  • proof
  • Theorem B.2
  • proof
  • Theorem B.3
  • ...and 14 more