Table of Contents
Fetching ...

Analysis of Stepped-Wedge Randomised Cluster Trial using a generalized pairwise comparison approach : a simulation study

Yohan Bard, Emilie Presles, Marc Buyse, Silvy Laporte, Paul Zufferey, Frederikus A. Klok, Olivier Sanchez, Francis Couturaud, Edouard Ollier

TL;DR

Overall, the findings identify b4 and c2 as the most reliable GPC-based strategies for SW-CRT analysis and provide practical guidance for their application, including for ongoing trials such as ETHER.

Abstract

Stepped-wedge cluster randomised trials (SW-CRTs) increasingly evaluate complex interventions, yet methodological guidance for analysing composite endpoints using generalized pairwise comparisons (GPC)remains limited. This work investigates the performance of several GPC-based estimators in the presence of clustering, temporal trends, and varying correlation structures typical of SW-CRTs. We conducted an extensive simulation study covering a range of intraclass correlations (ICC), cluster autocorrelation coefficients (CAC), time effects, and treatment effect sizes. Eight analytical approaches were compared, including unadjusted estimators, cluster-stratified win odds, mixed-effects models applied to cluster-period win odds, and probabilistic index models (PIMs). Type I error control was strongly compromised for methods ignoring time or clustering, whereas only two approaches consistently maintained nominal error rates: a hierarchical mixed-effects model with sequence and cluster-level random slopes (b4) and a cluster-restricted PIM (c2). These two methods were further evaluated in terms of statistical power, where c2 generally showed higher efficiency, particularly under strong clustering, low CAC, or the presence of temporal trends, while both converged to similar performance for large treatment effects. Overall, our findings identify b4 and c2 as the most reliable GPC-based strategies for SW-CRT analysis and provide practical guidance for their application, including for ongoing trials such as ETHER.

Analysis of Stepped-Wedge Randomised Cluster Trial using a generalized pairwise comparison approach : a simulation study

TL;DR

Overall, the findings identify b4 and c2 as the most reliable GPC-based strategies for SW-CRT analysis and provide practical guidance for their application, including for ongoing trials such as ETHER.

Abstract

Stepped-wedge cluster randomised trials (SW-CRTs) increasingly evaluate complex interventions, yet methodological guidance for analysing composite endpoints using generalized pairwise comparisons (GPC)remains limited. This work investigates the performance of several GPC-based estimators in the presence of clustering, temporal trends, and varying correlation structures typical of SW-CRTs. We conducted an extensive simulation study covering a range of intraclass correlations (ICC), cluster autocorrelation coefficients (CAC), time effects, and treatment effect sizes. Eight analytical approaches were compared, including unadjusted estimators, cluster-stratified win odds, mixed-effects models applied to cluster-period win odds, and probabilistic index models (PIMs). Type I error control was strongly compromised for methods ignoring time or clustering, whereas only two approaches consistently maintained nominal error rates: a hierarchical mixed-effects model with sequence and cluster-level random slopes (b4) and a cluster-restricted PIM (c2). These two methods were further evaluated in terms of statistical power, where c2 generally showed higher efficiency, particularly under strong clustering, low CAC, or the presence of temporal trends, while both converged to similar performance for large treatment effects. Overall, our findings identify b4 and c2 as the most reliable GPC-based strategies for SW-CRT analysis and provide practical guidance for their application, including for ongoing trials such as ETHER.
Paper Structure (19 sections, 29 equations, 4 figures, 1 table)

This paper contains 19 sections, 29 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Example of stepped-wedge trial with 5 sequences (6 periods)
  • Figure 2: Empirical type I error rates ($\alpha = 0.05$) for all evaluated methods under the null hypothesis of no treatment effect ($\delta=0$), for baseline risk $p_0 = 0.1$. The results are shown as a function of the intraclass correlation (ICC), and stratified by the cluster autocorrelation coefficient (CAC) and the secular time trend ($\beta_t$). Methods a1--a2 (unadjusted), b1--b4 (cluster-level mixed models), and c1--c2 (probabilistic index models) are compared. The dashed red line indicates the nominal 5% level.
  • Figure 3: Statistical power of the two best-performing methods, the hierarchical mixed-effects model with sequence- and cluster-level random slopes (Model b4) and the cluster-restricted probabilistic index model (Model c2), across all non-null treatment effect scenarios ($\delta > 0$) for baseline risk $p_0=0.1$. For each value of the treatment effect ($\delta$), two bars are shown side-by-side corresponding to the two methods. Panels are stratified by intraclass correlation (ICC), cluster autocorrelation coefficient (CAC), and the secular time trend $\beta_t$. The horizontal dashed lines at 80% and 90% indicate a commonly used power benchmark.
  • Figure 4: Statistical power of the different methods used to estimate the win odds across the simulation scenarios. Bars represent the proportion of replications in which the null hypothesis is rejected (p < 0.05). Results are stratified by ICC (rows) and by CAC crossed with the temporal effect $\beta_t$ (columns). Pastel colors distinguish the successive outcome sets, while transparency encodes the statistical method: Model c2 and Model b4. The horizontal red dashed line indicates the 80% power threshold.)