Integrating RCTs, RWD, AI/ML and Statistics: Next-Generation Evidence Synthesis
Shu Yang, Margaret Gamalo, Haoda Fu
TL;DR
The paper argues that next-generation evidence synthesis should integrate randomized trials (RCTs), real-world data (RWD), and AI/ML with traditional statistics rather than favor one paradigm. It proposes a causal roadmap to combine these sources, including transportability of RCT results to broader populations, AI-assisted analyses within RCTs, hybrid controlled trial designs, and linking short-term RCTs with long-term RWD to capture durability and safety. Key contributions include a framework for four-step causal inference, the promotion of methods like target trial emulation, doubly robust estimators, digital twins, and active learning, as well as practical guidance on regulatory considerations and uncertainty quantification. The work emphasizes trust, data provenance, privacy-preserving analytics, and rigorous uncertainty assessment as essential for regulatory credibility and patient-centered decision-making in modern regulatory science.
Abstract
Randomized controlled trials (RCTs) have been the cornerstone of clinical evidence; however, their cost, duration, and restrictive eligibility criteria limit power and external validity. Studies using real-world data (RWD), historically considered less reliable for establishing causality, are now recognized to be important for generating real-world evidence (RWE). In parallel, artificial intelligence and machine learning (AI/ML) are being increasingly used throughout the drug development process, providing scalability and flexibility but also presenting challenges in interpretability and rigor that traditional statistics do not face. This Perspective argues that the future of evidence generation will not depend on RCTs versus RWD, or statistics versus AI/ML, but on their principled integration. To this end, a causal roadmap is needed to clarify inferential goals, make assumptions explicit, and ensure transparency about tradeoffs. We highlight key objectives of integrative evidence synthesis, including transporting RCT results to broader populations, embedding AI-assisted analyses within RCTs, designing hybrid controlled trials, and extending short-term RCTs with long-term RWD. We also outline future directions in privacy-preserving analytics, uncertainty quantification, and small-sample methods. By uniting statistical rigor with AI/ML innovation, integrative approaches can produce robust, transparent, and policy-relevant evidence, making them a key component of modern regulatory science.
