Table of Contents
Fetching ...

Rethinking the Win Ratio: A Causal Framework for Hierarchical Outcome Analysis

Mathieu Even, Julie Josse

TL;DR

The paper tackles the challenge of causal inference with hierarchical, multivariate outcomes by embedding Win Ratio and Generalized Pairwise Comparisons in a formal potential-outcomes framework. It reveals that the estimand depends on how treated-control pairs are formed, showing that traditional complete pairings can yield misleading conclusions in heterogeneous populations. To address this, the authors introduce an identifiable, individual-level estimand $ au_igstar$ and establish that Nearest Neighbor pairings consistently estimate it in randomized settings; they also extend to observational data via IPW and a distributional-regression-based approach with a doubly robust variant. Through synthetic experiments and the CRASH-3 trial, they demonstrate that their methods can provide more robust and sometimes drastically different treatment recommendations than traditional approaches, highlighting the importance of targeting the appropriate estimand for valid causal interpretation and application to real-world data.

Abstract

Quantifying causal effects in the presence of complex and multivariate outcomes is a key challenge to evaluate treatment effects. For hierarchical multivarariates outcomes, the FDA recommends the Win Ratio and Generalized Pairwise Comparisons approaches. However, as far as we know, these empirical methods lack causal or statistical foundations to justify their broader use in recent studies. To address this gap, we establish causal foundations for hierarchical comparison methods. We define related causal effect measures, and highlight that depending on the methodology used to compute Win Ratios or Net Benefits of treatments, the causal estimand targeted can be different, as proved by our consistency results. Quite dramatically, it appears that the causal estimand related to the historical estimation approach can yield reversed and incorrect treatment recommendations in heterogeneous populations, as we illustrate through striking examples. In order to compensate for this fallacy, we introduce a novel, individual-level yet identifiable causal effect measure that better approximates the ideal, non-identifiable individual-level estimand. We prove that computing Win Ratio or Net Benefits using a Nearest Neighbor pairing approach between treated and controlled patients, an approach that can be seen as an extreme form of stratification, leads to estimating this new causal estimand measure. We extend our methods to observational settings via propensity weighting, distributional regression to address the curse of dimensionality, and a doubly robust framework. We prove the consistency of our methods, and the double robustness of our augmented estimator. Finally, we validate our approach using synthetic data and on CRASH-3, a major clinical trial focused on assessing the effects of tranexamic acid in patients with traumatic brain injury.

Rethinking the Win Ratio: A Causal Framework for Hierarchical Outcome Analysis

TL;DR

The paper tackles the challenge of causal inference with hierarchical, multivariate outcomes by embedding Win Ratio and Generalized Pairwise Comparisons in a formal potential-outcomes framework. It reveals that the estimand depends on how treated-control pairs are formed, showing that traditional complete pairings can yield misleading conclusions in heterogeneous populations. To address this, the authors introduce an identifiable, individual-level estimand and establish that Nearest Neighbor pairings consistently estimate it in randomized settings; they also extend to observational data via IPW and a distributional-regression-based approach with a doubly robust variant. Through synthetic experiments and the CRASH-3 trial, they demonstrate that their methods can provide more robust and sometimes drastically different treatment recommendations than traditional approaches, highlighting the importance of targeting the appropriate estimand for valid causal interpretation and application to real-world data.

Abstract

Quantifying causal effects in the presence of complex and multivariate outcomes is a key challenge to evaluate treatment effects. For hierarchical multivarariates outcomes, the FDA recommends the Win Ratio and Generalized Pairwise Comparisons approaches. However, as far as we know, these empirical methods lack causal or statistical foundations to justify their broader use in recent studies. To address this gap, we establish causal foundations for hierarchical comparison methods. We define related causal effect measures, and highlight that depending on the methodology used to compute Win Ratios or Net Benefits of treatments, the causal estimand targeted can be different, as proved by our consistency results. Quite dramatically, it appears that the causal estimand related to the historical estimation approach can yield reversed and incorrect treatment recommendations in heterogeneous populations, as we illustrate through striking examples. In order to compensate for this fallacy, we introduce a novel, individual-level yet identifiable causal effect measure that better approximates the ideal, non-identifiable individual-level estimand. We prove that computing Win Ratio or Net Benefits using a Nearest Neighbor pairing approach between treated and controlled patients, an approach that can be seen as an extreme form of stratification, leads to estimating this new causal estimand measure. We extend our methods to observational settings via propensity weighting, distributional regression to address the curse of dimensionality, and a doubly robust framework. We prove the consistency of our methods, and the double robustness of our augmented estimator. Finally, we validate our approach using synthetic data and on CRASH-3, a major clinical trial focused on assessing the effects of tranexamic acid in patients with traumatic brain injury.

Paper Structure

This paper contains 44 sections, 8 theorems, 145 equations, 7 figures.

Key Result

Proposition 1

We have: and where $P_{Y^{(X_i)}(1),Y_i(0)},P_{Y_i(1),Y_i(0)},P_{Y_j(1),Y_i(0)}$ are respectively the joint distributions of $(Y^{(X_i)}(1),Y_i(0))$, $(Y_i(1),Y_i(0))$ and $(Y_i(1),Y_j(0))$, and $d_\mathrm{TV}$ is the total-variation distance between distributions. Furthermore, if the win function $w$ is $1-$Lip and where $\mathcal{W}_1$ is the $1-$Wasserstein distance between distributions.

Figures (7)

  • Figure 1: Comparison of the win proportion $p_\mathrm{W}$ computed with complete pairings and Nearest Neighbor pairings. Setting of \ref{['ex:counter_emp']}. Boxplots over 100 runs. The two approaches lead to different treatment recommendations (above and below 0.5).
  • Figure 2: Testing for the impact of the dimension, correlated outcomes setting. Boxplots over 100 runs. DRF AIPW WR, DRF WR and NearestNeigh WR respectively correspond to the AIPW method in \ref{['eq:estim_DRF_AIPW_WR']}, the direct distributional approach in \ref{['eq:estim_without_IPW_DRF']} and the weighted Nearest Neighbor approach in \ref{['eq:IPW_NN']}. For the AIPW and distributional regression approach, Distributional Random Forests (DRFs, pmlr-v238-benard24a_DRF) are used to perform distributional regression, using 1000 trees. For the weighted Nearest Neighbor and for the AIPW approaches, probability forests (of the GRF package) are used to estimate propensities, with 1000 trees. Propensity scores estimated with logistic regression gave comparable results.
  • Figure 3: Same setting as in \ref{['fig:dimensions_homogeneous_wo_cheating']}, with an added method: Nearest Neighbor with a ‘cheating' option, that corresponds to exactly plugging in the propensity scores instead of estimating them. Boxplots over 100 runs.
  • Figure 4: Testing for the impact of the dimension, uncorrelated outcomes setting. Boxplots over 100 runs. DRF AIPW WR, DRF WR and NearestNeigh WR as in \ref{['fig:dimensions_homogeneous_wo_cheating']}
  • Figure 5: Testing for the impact of the dimension, uncorrelated outcomes setting. Boxplots over 100 runs. DRF AIPW WR, DRF WR, NearestNeigh WR with and without ‘CHEATING' option as in \ref{['fig:dimensions_homogeneous_cheating']}
  • ...and 2 more figures

Theorems & Definitions (24)

  • Definition 1: Win function
  • Example 1
  • Definition 2
  • Remark 1
  • Remark 2
  • Remark 3: On the well-posedness of \ref{['def:win_indiv']}
  • Example 2
  • Proposition 1
  • Theorem 1: Consistency of Win Ratio
  • Remark 4: Win Ratio and comparisons with strata
  • ...and 14 more