Assessment of evidence against homogeneity in exhaustive subgroup treatment effect plots

Björn Bornkamp; Jiarui Lu; Frank Bretz

Assessment of evidence against homogeneity in exhaustive subgroup treatment effect plots

Björn Bornkamp, Jiarui Lu, Frank Bretz

TL;DR

This paper addresses how to formally assess evidence against homogeneity in exhaustive subgroup treatment effect plots, a visualization that shows treatment effects across many subgroups with varying sizes. It introduces a Doubly Robust (DR) learner–based approach to generate pseudo-outcomes and compute divergence-based p-values and gamma-homogeneity regions, enabling a quantitative assessment of compatibility with homogeneous effects. Through a cardiovascular case study and extensive simulations, the authors demonstrate well-calibrated inference and improved performance over simple mean-difference approaches. The method yields interpretable, graded evidence (via p-values and S-values) and can be integrated into interactive workflows to support decision-making in exploratory subgroup analyses.

Abstract

Exhaustive subgroup treatment effect plots are constructed by displaying all subgroup treatment effects of interest against subgroup sample size, providing a useful overview of the observed treatment effect heterogeneity in a clinical trial. As in any exploratory subgroup analysis, however, the observed estimates suffer from small sample sizes and multiplicity issues. To facilitate more interpretable exploratory assessments, this paper introduces a computationally efficient method to generate homogeneity regions within exhaustive subgroup treatment effect plots. Using the Doubly Robust (DR) learner, pseudo-outcomes are used to estimate subgroup effects and derive reference distributions, quantifying how surprising observed heterogeneity is under a homogeneous effects model. Explicit formulas are derived for the homogeneity region and different methods for calculation of the critical values are compared. The method is illustrated with a cardiovascular trial and evaluated via simulation, showing well-calibrated inference and improved performance over standard approaches using simple differences of observed group means.

Assessment of evidence against homogeneity in exhaustive subgroup treatment effect plots

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Case study
Methodology
Simulation study
Aims
Data-generating mechanisms
Methods
Performance measures
Results
Example revisited
Discussion
Covariate data description for simulated, synthetic data
Calculation of correlation across estimates
Additional plot

Figures (6)

Figure 1: Exhaustive subgroup treatment effect plot, showing 1408 one-level or two-level subgroup treatment effects. The dashed line corresponds to the overall treatment effect estimate.
Figure 2: Empirical distribution function of p-values under no treatment effect heterogeneity. Results are pooled across the four scenarios from Table \ref{['tab:sim_models']}, thus resulting in 8000 simulations.
Figure 3: Distribution of p-values under treatment effect heterogeneity, shown as density plots. Horizontal lines indicate the median.
Figure 4: Proportion of p-values less than 0.1 across all scenarios.
Figure 5: Exhaustive subgroup treatment effect plot, showing 1408 one-level or two-level subgroup treatment effects. The dashed line corresponds to the overall treatment effect estimate.
...and 1 more figures

Assessment of evidence against homogeneity in exhaustive subgroup treatment effect plots

TL;DR

Abstract

Assessment of evidence against homogeneity in exhaustive subgroup treatment effect plots

Authors

TL;DR

Abstract

Table of Contents

Figures (6)