Table of Contents
Fetching ...

Sample size and power calculations for causal inference of observational studies

Bo Liu, Chengxin Yang, Fan Li

Abstract

This paper investigates the theoretical foundation and develops analytical formulas for sample size and power calculations for causal inference with observational data. By analyzing the variance of an inverse probability weighting estimator of the average treatment effect, we decompose the power calculation into three components: propensity score distribution, potential outcome distribution, and their correlation. We show that to determine the minimal sample size of an observational study, in addition to the standard inputs in the power calculation of randomized trials, it is sufficient to have two parameters, which quantify the strength of the confounder-treatment and the confounder-outcome association, respectively. For the former, we propose using the Bhattacharyya coefficient, which measures the covariate overlap and, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. For the latter, we propose a sensitivity parameter bounded by the R-squared statistic of the regression of the outcome on covariates. Our procedure relies on a parametric propensity score model and a semiparametric restricted mean outcome model, but does not require distributional assumptions on the multivariate covariates. We develop an associated R package PSpower.

Sample size and power calculations for causal inference of observational studies

Abstract

This paper investigates the theoretical foundation and develops analytical formulas for sample size and power calculations for causal inference with observational data. By analyzing the variance of an inverse probability weighting estimator of the average treatment effect, we decompose the power calculation into three components: propensity score distribution, potential outcome distribution, and their correlation. We show that to determine the minimal sample size of an observational study, in addition to the standard inputs in the power calculation of randomized trials, it is sufficient to have two parameters, which quantify the strength of the confounder-treatment and the confounder-outcome association, respectively. For the former, we propose using the Bhattacharyya coefficient, which measures the covariate overlap and, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. For the latter, we propose a sensitivity parameter bounded by the R-squared statistic of the regression of the outcome on covariates. Our procedure relies on a parametric propensity score model and a semiparametric restricted mean outcome model, but does not require distributional assumptions on the multivariate covariates. We develop an associated R package PSpower.
Paper Structure (32 sections, 6 theorems, 75 equations, 6 figures, 5 tables)

This paper contains 32 sections, 6 theorems, 75 equations, 6 figures, 5 tables.

Key Result

Theorem 1

Assume $X_1, X_2, \dots$ is a sequence of independent random variables with mean 0 and variance 1. Let $\beta_1, \beta_2, \dots$ be a sequence of real numbers. If the following two conditions hold: (i) there exists a positive number $B$ such that $\mathbb{E}(X_j^4) < B$, and (ii) $\max_{1 \leq j \le

Figures (6)

  • Figure 1: Distributions of $e(X)$ of the treatment and control groups corresponding to different combinations of $(r, \phi)$, where $e(X)$ follows a $\mathsf{Beta}(a, b)$ distribution with $a$ and $b$ determined by $(r, \phi)$.
  • Figure 2: Flow chat of decomposing and computing the variance of the Hájek estimator from summary inputs.
  • Figure 3: Distribution of the propensity scores in two treatment arms with fixed $r=0.5$ and various overlap coefficient $\phi=1.00, 0.98, 0.93, 0.87, 0.84, 0.81$, respectively, in Simulation \ref{['sec:simulation-design']}.
  • Figure 4: Power and sample size curves of an emulated Right Heart Catheterization study with an effect size $\widetilde{\tau}=0.14$ under different overlap coefficient $\phi$ (left) and the correlation $\rho=\rho_1=\rho_0$ (right).
  • Figure 5: Density and Q-Q plots of the fitted logit propensity scores, a linear combination of covariates. The proximity of points to the reference line in the Q-Q plot shows the closeness of the distribution to a normal distribution.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Corollary 1
  • Theorem 3: Lyapunov CLT
  • Lemma 4