Table of Contents
Fetching ...

Using Individualized Treatment Effects to Assess Treatment Effect Heterogeneity

Konstantinos Sechidis, Cong Zhang, Sophie Sun, Yao Chen, Asher Spector, Björn Bornkamp

TL;DR

This work advances TEH assessment by integrating a doubly robust DR-learner into the WATCH workflow to deliver a three-pronged TEH analysis: a global test for homogeneity, identification of effect modifiers, and estimation of individualized treatment effects. By constructing pseudo-outcomes via a stacked ensemble of nuisance models and applying cross-fitting, the method achieves robustness to misspecification and enhanced precision. Through extensive simulations and a psoriatic arthritis pooled-trial application, the DR-learner demonstrates strong performance across objectives, often outperforming traditional meta-learners and tree-based approaches, and identifies clinically meaningful modifiers such as CRP, age, and BD-2. The framework supports informed, personalized decision-making in drug development and trial design, with potential extensions to time-to-event and other endpoints.

Abstract

Assessing treatment effect heterogeneity (TEH) in clinical trials is crucial, as it provides insights into the variability of treatment responses among patients, influencing important decisions related to drug development. Furthermore, it can lead to personalized medicine by tailoring treatments to individual patient characteristics. This paper introduces novel methodologies for assessing treatment effects using the individual treatment effect as a basis. To estimate this effect, we use a Double Robust (DR) learner to infer a pseudo-outcome that reflects the causal contrast. This pseudo-outcome is then used to perform three objectives: (1) a global test for heterogeneity, (2) ranking covariates based on their influence on effect modification, and (3) providing estimates of the individualized treatment effect. We compare our DR-learner with various alternatives and competing methods in a simulation study, and also use it to assess heterogeneity in a pooled analysis of five Phase III trials in psoriatic arthritis. By integrating these methods with the recently proposed WATCH workflow (Workflow to Assess Treatment Effect Heterogeneity in Drug Development for Clinical Trial Sponsors), we provide a robust framework for analyzing TEH, offering insights that enable more informed decision-making in this challenging area.

Using Individualized Treatment Effects to Assess Treatment Effect Heterogeneity

TL;DR

This work advances TEH assessment by integrating a doubly robust DR-learner into the WATCH workflow to deliver a three-pronged TEH analysis: a global test for homogeneity, identification of effect modifiers, and estimation of individualized treatment effects. By constructing pseudo-outcomes via a stacked ensemble of nuisance models and applying cross-fitting, the method achieves robustness to misspecification and enhanced precision. Through extensive simulations and a psoriatic arthritis pooled-trial application, the DR-learner demonstrates strong performance across objectives, often outperforming traditional meta-learners and tree-based approaches, and identifies clinically meaningful modifiers such as CRP, age, and BD-2. The framework supports informed, personalized decision-making in drug development and trial design, with potential extensions to time-to-event and other endpoints.

Abstract

Assessing treatment effect heterogeneity (TEH) in clinical trials is crucial, as it provides insights into the variability of treatment responses among patients, influencing important decisions related to drug development. Furthermore, it can lead to personalized medicine by tailoring treatments to individual patient characteristics. This paper introduces novel methodologies for assessing treatment effects using the individual treatment effect as a basis. To estimate this effect, we use a Double Robust (DR) learner to infer a pseudo-outcome that reflects the causal contrast. This pseudo-outcome is then used to perform three objectives: (1) a global test for heterogeneity, (2) ranking covariates based on their influence on effect modification, and (3) providing estimates of the individualized treatment effect. We compare our DR-learner with various alternatives and competing methods in a simulation study, and also use it to assess heterogeneity in a pooled analysis of five Phase III trials in psoriatic arthritis. By integrating these methods with the recently proposed WATCH workflow (Workflow to Assess Treatment Effect Heterogeneity in Drug Development for Clinical Trial Sponsors), we provide a robust framework for analyzing TEH, offering insights that enable more informed decision-making in this challenging area.

Paper Structure

This paper contains 33 sections, 12 equations, 18 figures, 4 tables, 1 algorithm.

Figures (18)

  • Figure 1: Overview of WATCH workflow and the four main steps: (1) Analysis Planning, (2) Initial Data Analysis and Analysis Dataset Creation, (3) TEH Exploration, and (4) Multidisciplinary Assessment.
  • Figure 2: Comparison of the two methods for testing heterogeneity presented in Section \ref{['sec:DR_learner:objective_1']} with respect to Objective 1(i). Data are simulated under the condition of no treatment effect heterogeneity (i.e., $\beta_1 = 0$). We present the ECDF of p-values under the null hypothesis. For uniformly distributed p-values, the ECDF follows a diagonal line.
  • Figure 3: Comparison of the two methods for testing heterogeneity presented in Section \ref{['sec:DR_learner:objective_1']} with respect to Objective 1(ii). Data are simulated under various degrees of treatment effect heterogeneity (x-axis). We report the average p-values (across 500 runs) and when there is treatment effect heterogeneity (i.e., $\beta_1 > 0$), the lower the p-value, the more powerful the method.
  • Figure 4: Comparison of two methods for deriving effect modifiers (presented in Sec. \ref{['sec:DR_learner:objective_2']}) with respect to Objective 2(i). Data are simulated under the condition of no treatment effect heterogeneity (i.e., $\beta_1 = 0$). We report the average probability (across 500 runs) that each biomarker is selected as the most important predictive biomarker. Since there is no treatment effect heterogeneity, for a method to be unbiased all biomarkers should have probability equal to $1/30 \approx 0.03,$ dashed vertical line.
  • Figure 5: Comparison of two methods for deriving effect modifiers (presented in Sec. \ref{['sec:DR_learner:objective_2']}) with respect to Objective 2(ii). Data are simulated under various degrees of treatment effect heterogeneity (x-axis). We report the average (across 500 runs) probability that the top selected biomarker is truly predictive, and the higher this probability are the better the performance.
  • ...and 13 more figures