Table of Contents
Fetching ...

Covariate Adjustment in Randomized Experiments Motivated by Higher-Order Influence Functions

Sihui Zhao, Xinbo Wang, Lin Liu, Xin Zhang

TL;DR

The paper addresses covariate adjustment in randomized trials with high-dimensional baseline data by applying Higher-Order Influence Functions (HOIF) to yield order-optimal, design-based estimators for treatment-specific means and ATE. It shows that HOIF-motivated estimators, notably the adj,2 variant and its debiased relatives, can outperform traditional unadjusted or linear-model-based adjustments when $p$ grows with $n$, with formal bias and variance characterizations. Moreover, it demonstrates that several state-of-the-art adjusted estimators are special cases within the HOIF framework, unifying diverse approaches under a common theoretical lens and offering practical, unbiased variance estimators and an R package for implementation. The work includes simulations and a real-data application (NIDA-CTN-0030) to corroborate theoretical results and demonstrates the framework’s relevance for improving efficiency in randomized experiments with high-dimensional covariates.

Abstract

Higher-Order Influence Functions (HOIF), developed in a series of papers over the past twenty years, are a fundamental theoretical device for constructing rate-optimal causal-effect estimators from observational studies. However, the value of HOIF for analyzing well-conducted randomized controlled trials (RCT) has not been explicitly explored. In the recent U.S. Food and Drug Administration and European Medicines Agency guidelines on the practice of covariate adjustment in analyzing RCT, in addition to the simple, unadjusted difference-in-mean estimator, it was also recommended to report the estimator adjusting for baseline covariates via a simple parametric working model, such as a linear model. However, when the number of baseline covariates $p$ is large, the recommendation is somewhat murky. In this paper, we show that HOIF-motivated estimators for the treatment-specific mean have significantly improved statistical properties compared to popular adjusted estimators in practice when $p$ is relatively large relative to the sample size $n$. We also characterize the conditions under which the HOIF-motivated estimator improves upon the unadjusted one. More importantly, we demonstrate that several state-of-the-art adjusted estimators proposed recently can be interpreted as particular HOIF-motivated estimators, thereby placing these estimators in a more unified framework. Numerical and empirical studies are conducted to corroborate our theoretical findings. An accompanying R package can be found on CRAN.

Covariate Adjustment in Randomized Experiments Motivated by Higher-Order Influence Functions

TL;DR

The paper addresses covariate adjustment in randomized trials with high-dimensional baseline data by applying Higher-Order Influence Functions (HOIF) to yield order-optimal, design-based estimators for treatment-specific means and ATE. It shows that HOIF-motivated estimators, notably the adj,2 variant and its debiased relatives, can outperform traditional unadjusted or linear-model-based adjustments when grows with , with formal bias and variance characterizations. Moreover, it demonstrates that several state-of-the-art adjusted estimators are special cases within the HOIF framework, unifying diverse approaches under a common theoretical lens and offering practical, unbiased variance estimators and an R package for implementation. The work includes simulations and a real-data application (NIDA-CTN-0030) to corroborate theoretical results and demonstrates the framework’s relevance for improving efficiency in randomized experiments with high-dimensional covariates.

Abstract

Higher-Order Influence Functions (HOIF), developed in a series of papers over the past twenty years, are a fundamental theoretical device for constructing rate-optimal causal-effect estimators from observational studies. However, the value of HOIF for analyzing well-conducted randomized controlled trials (RCT) has not been explicitly explored. In the recent U.S. Food and Drug Administration and European Medicines Agency guidelines on the practice of covariate adjustment in analyzing RCT, in addition to the simple, unadjusted difference-in-mean estimator, it was also recommended to report the estimator adjusting for baseline covariates via a simple parametric working model, such as a linear model. However, when the number of baseline covariates is large, the recommendation is somewhat murky. In this paper, we show that HOIF-motivated estimators for the treatment-specific mean have significantly improved statistical properties compared to popular adjusted estimators in practice when is relatively large relative to the sample size . We also characterize the conditions under which the HOIF-motivated estimator improves upon the unadjusted one. More importantly, we demonstrate that several state-of-the-art adjusted estimators proposed recently can be interpreted as particular HOIF-motivated estimators, thereby placing these estimators in a more unified framework. Numerical and empirical studies are conducted to corroborate our theoretical findings. An accompanying R package can be found on CRAN.

Paper Structure

This paper contains 23 sections, 13 theorems, 111 equations, 13 figures, 2 tables.

Key Result

Lemma 1

Under Assumption as:randomization, we have $\mathrm{var} (\widehat{\tau}_{\mathsf{unadj}}) = \dfrac{1}{n} \mathrm{var} \left\{ \dfrac{t}{\pi_{1}} y \right\}$ and If $(\mathbf{x}_{i} - \bm{\mu})^{\top} \bm{\beta}$ in $\widetilde{\tau}_{\mathsf{adj}}$ is replaced by the true outcome regression function $\mathrm{b} (\cdot)$, the variance of $\widetilde{\tau}_{\mathsf{adj}}$ attains the SVB.

Figures (13)

  • Figure 1: Relative efficiencies of $\widehat{\tau}_{\mathsf{adj}, 2}^{\dag}$ (or equivalently $\widehat{\tau}_{\mathsf{db}}$), $\widehat{\tau}_{\mathsf{adj}, 2}$, $\widehat{\tau}_{\mathsf{adj}, 3}$ based on exact and approximate formula.
  • Figure 2: CoV$^{2}$ vs. $\mathrm{var}^{\mathsf{d}} (\widehat{\tau}_{\mathsf{adj}, 2}) / \mathrm{var}^{\mathsf{d}} (\widehat{\tau}_{\mathsf{adj}, 2}^{\dag})$. Each point represents a particular simulation setting in Figure \ref{['fig:oracle_var_main']}; only the settings with CoV$^{2} \leq 100$ are shown here.
  • Figure 3: Relative efficiency of $\widehat{\tau}_{\mathsf{adj}, 2}^{\dag}$ (or equivalently $\widehat{\tau}_{\mathsf{db}}$), $\widehat{\tau}_{\mathsf{adj}, 2}$, $\widehat{\tau}_{\mathsf{adj}, 3}$ based on exact and approximate formula.
  • Figure 4: Empirical relative absolute bias for different $\pi_1$, $\gamma$, $\alpha$ and $n$ under the independent t error and the worst-case error.
  • Figure 5: RMSE for different $\pi_1$, $\gamma$, $\alpha$ and $n$ under the independent t error and the worst-case error.
  • ...and 8 more figures

Theorems & Definitions (34)

  • Lemma 1
  • Remark 1
  • Remark 2
  • Theorem 1
  • Remark 3: Interpreting $\widehat{\tau}_{\mathsf{adj}, 2}$ as a leave-one-out regression adjustment estimator
  • Remark 4
  • Proposition 1
  • Remark 5: On the asymptotic distribution of $\widehat{\tau}_{\mathsf{adj}, 2}$
  • Remark 6: Semiparametric efficiency under the superpopulation framework
  • Proposition 2
  • ...and 24 more