Table of Contents
Fetching ...

Integrating RCTs, RWD, AI/ML and Statistics: Next-Generation Evidence Synthesis

Shu Yang, Margaret Gamalo, Haoda Fu

TL;DR

The paper argues that next-generation evidence synthesis should integrate randomized trials (RCTs), real-world data (RWD), and AI/ML with traditional statistics rather than favor one paradigm. It proposes a causal roadmap to combine these sources, including transportability of RCT results to broader populations, AI-assisted analyses within RCTs, hybrid controlled trial designs, and linking short-term RCTs with long-term RWD to capture durability and safety. Key contributions include a framework for four-step causal inference, the promotion of methods like target trial emulation, doubly robust estimators, digital twins, and active learning, as well as practical guidance on regulatory considerations and uncertainty quantification. The work emphasizes trust, data provenance, privacy-preserving analytics, and rigorous uncertainty assessment as essential for regulatory credibility and patient-centered decision-making in modern regulatory science.

Abstract

Randomized controlled trials (RCTs) have been the cornerstone of clinical evidence; however, their cost, duration, and restrictive eligibility criteria limit power and external validity. Studies using real-world data (RWD), historically considered less reliable for establishing causality, are now recognized to be important for generating real-world evidence (RWE). In parallel, artificial intelligence and machine learning (AI/ML) are being increasingly used throughout the drug development process, providing scalability and flexibility but also presenting challenges in interpretability and rigor that traditional statistics do not face. This Perspective argues that the future of evidence generation will not depend on RCTs versus RWD, or statistics versus AI/ML, but on their principled integration. To this end, a causal roadmap is needed to clarify inferential goals, make assumptions explicit, and ensure transparency about tradeoffs. We highlight key objectives of integrative evidence synthesis, including transporting RCT results to broader populations, embedding AI-assisted analyses within RCTs, designing hybrid controlled trials, and extending short-term RCTs with long-term RWD. We also outline future directions in privacy-preserving analytics, uncertainty quantification, and small-sample methods. By uniting statistical rigor with AI/ML innovation, integrative approaches can produce robust, transparent, and policy-relevant evidence, making them a key component of modern regulatory science.

Integrating RCTs, RWD, AI/ML and Statistics: Next-Generation Evidence Synthesis

TL;DR

The paper argues that next-generation evidence synthesis should integrate randomized trials (RCTs), real-world data (RWD), and AI/ML with traditional statistics rather than favor one paradigm. It proposes a causal roadmap to combine these sources, including transportability of RCT results to broader populations, AI-assisted analyses within RCTs, hybrid controlled trial designs, and linking short-term RCTs with long-term RWD to capture durability and safety. Key contributions include a framework for four-step causal inference, the promotion of methods like target trial emulation, doubly robust estimators, digital twins, and active learning, as well as practical guidance on regulatory considerations and uncertainty quantification. The work emphasizes trust, data provenance, privacy-preserving analytics, and rigorous uncertainty assessment as essential for regulatory credibility and patient-centered decision-making in modern regulatory science.

Abstract

Randomized controlled trials (RCTs) have been the cornerstone of clinical evidence; however, their cost, duration, and restrictive eligibility criteria limit power and external validity. Studies using real-world data (RWD), historically considered less reliable for establishing causality, are now recognized to be important for generating real-world evidence (RWE). In parallel, artificial intelligence and machine learning (AI/ML) are being increasingly used throughout the drug development process, providing scalability and flexibility but also presenting challenges in interpretability and rigor that traditional statistics do not face. This Perspective argues that the future of evidence generation will not depend on RCTs versus RWD, or statistics versus AI/ML, but on their principled integration. To this end, a causal roadmap is needed to clarify inferential goals, make assumptions explicit, and ensure transparency about tradeoffs. We highlight key objectives of integrative evidence synthesis, including transporting RCT results to broader populations, embedding AI-assisted analyses within RCTs, designing hybrid controlled trials, and extending short-term RCTs with long-term RWD. We also outline future directions in privacy-preserving analytics, uncertainty quantification, and small-sample methods. By uniting statistical rigor with AI/ML innovation, integrative approaches can produce robust, transparent, and policy-relevant evidence, making them a key component of modern regulatory science.

Paper Structure

This paper contains 25 sections, 2 figures.

Figures (2)

  • Figure 1: Spectrum of trial designs and emerging methodologies integrating RWD.This figure illustrates the continuum of clinical evidence generation approaches, ranging from traditional RCTs to pragmatic RCTs and observational studies, with progressively increasing reliance on RWD concato2022real. Below, emerging methodological innovations are categorized by their primary function and regulatory readiness: (i) Causal inference techniques that enhance generalizability and subgroup analyses; (ii) Generative AI approaches for data augmentation and scenario simulation; and (iii) Agentic AI tools for adaptive trial planning and modeling patient behavior. Together, these approaches represent a shift toward more adaptive, data-rich, and scalable evidence generation frameworks that bridge randomized and real-world settings.
  • Figure 2: A causal roadmap for integrative evidence synthesis.This figure outlines a structured framework for generating valid causal inferences by integrating RCTs with RWD. The process begins with the definition of the causal estimand (Step 1), followed by specification of the necessary causal assumptions and data (Step 2) to link the estimand to a well-defined statistical parameter. Estimation is then performed using appropriate statistical, artificial intelligence, or machine learning methods (Step 3), and the robustness of results is evaluated through sensitivity analysis (Step 4). The accompanying panels highlight key objectives of evidence synthesis, complementary characteristics of RCT and RWD sources, and the comparative roles of conventional statistical approaches and AI/ML techniques in enhancing causal analysis.