Optimizing precision in stepped-wedge designs via machine learning and quadratic inference functions

Liangbo Lyu; Bingkai Wang

Optimizing precision in stepped-wedge designs via machine learning and quadratic inference functions

Liangbo Lyu, Bingkai Wang

TL;DR

This work tackles precision in stepped-wedge designs by marrying cross-fitted, flexible covariate adjustment with adaptive correlation learning via quadratic inference functions (QIF). The proposed estimators are consistent and asymptotically normal under mild $L_2$-convergence of nuisance components, and, when mean models are correctly specified, can attain minimal asymptotic variance; crucially, QIF never degrades efficiency relative to independence and often improves precision by combining multiple correlation structures. Through simulations and two real-world applications, the authors demonstrate substantial finite-sample gains from combining ML-based covariate adjustment with QIF, especially in larger samples, while maintaining valid inference and accommodating treatment-effect heterogeneity across exposure duration and calendar time. The framework provides a principled, implementable alternative to standard mixed-effects approaches, with robust performance across diverse design features and settings, and it offers clear paths for extensions to covariate-adaptive randomization and missing data scenarios.

Abstract

Stepped-wedge designs are increasingly used in randomized experiments to accommodate logistical and ethical constraints by staggering treatment roll-out over time. Despite their popularity, existing analytical methods largely rely on parametric models with linear covariate adjustment and prespecified correlation structures, which may limit achievable precision in practice. We propose a new class of estimators for the causal average treatment effect in stepped-wedge designs that optimizes precision through flexible, machine-learning-based covariate adjustment to capture complex outcome-covariate relationships, together with quadratic inference functions to adaptively learn the correlation structure. We establish consistency and asymptotic normality under mild conditions requiring only $L_2$ convergence of nuisance estimators, even under model misspecification, and characterize when the estimator attains the minimal asymptotic variance. Moreover, we prove that the proposed estimator never reduces efficiency relative to an independence working correlation. The proposed method further accommodates treatment-effect heterogeneity across both exposure duration and calendar time. Finally, we demonstrate our methods through simulation studies and reanalyses of two empirical studies that differ substantially in research area and key design parameters.

Optimizing precision in stepped-wedge designs via machine learning and quadratic inference functions

TL;DR

-convergence of nuisance components, and, when mean models are correctly specified, can attain minimal asymptotic variance; crucially, QIF never degrades efficiency relative to independence and often improves precision by combining multiple correlation structures. Through simulations and two real-world applications, the authors demonstrate substantial finite-sample gains from combining ML-based covariate adjustment with QIF, especially in larger samples, while maintaining valid inference and accommodating treatment-effect heterogeneity across exposure duration and calendar time. The framework provides a principled, implementable alternative to standard mixed-effects approaches, with robust performance across diverse design features and settings, and it offers clear paths for extensions to covariate-adaptive randomization and missing data scenarios.

Abstract

convergence of nuisance estimators, even under model misspecification, and characterize when the estimator attains the minimal asymptotic variance. Moreover, we prove that the proposed estimator never reduces efficiency relative to an independence working correlation. The proposed method further accommodates treatment-effect heterogeneity across both exposure duration and calendar time. Finally, we demonstrate our methods through simulation studies and reanalyses of two empirical studies that differ substantially in research area and key design parameters.

Paper Structure (14 sections, 2 theorems, 9 equations, 2 figures, 3 tables)

This paper contains 14 sections, 2 theorems, 9 equations, 2 figures, 3 tables.

Introduction
Motivating examples
Improving caring quality for people with dementia in nursing homes using IPOS
Procedural justice training program
Definition and assumptions
Estimators
Flexible covariate adjustment with machine learning
Flexible correlation modeling with QIF
Asymptotic results
Simulation
Simulation setting
Simulation Results
Data application
Discussion

Key Result

Theorem 1

Assuming Assumptions 1-3 and $E\left[\left\{\widehat{g}_j^{(m)}(\boldsymbol{X}_{ik})- \underline{g}_j(\boldsymbol{X}_{ik})\right\}^2\right] \rightarrow 0$ for some integrable limit function $\underline{g}_j(\boldsymbol{X}_{ik})$, we have $\left\{\widehat{Var}^*(\widehat{\boldsymbol{\beta}}^*)\right\

Figures (2)

Figure 1: Point Estimates with 95% confidence intervals of IPOS data. For duration-specific, period-specific, and saturated treatment structures, results for the averaged treatment effects across estimands are presented.
Figure 2: Point estimates with 95% confidence intervals of Chicago Police Training Data. For duration-specific, period-specific, and saturated treatment structures, results for the averaged treatment effects across estimands are presented.

Theorems & Definitions (3)

Theorem 1
Theorem 2
Remark 1

Optimizing precision in stepped-wedge designs via machine learning and quadratic inference functions

TL;DR

Abstract

Optimizing precision in stepped-wedge designs via machine learning and quadratic inference functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (3)