How many patients could we save with LLM priors?
Shota Arai, David Selby, Andrew Vargo, Sebastian Vollmer
TL;DR
The paper tackles adverse-event modeling in multicenter clinical trials by embedding expert clinical knowledge into hierarchical Bayesian priors via large language models. It formulates a Poisson-Gamma AE model with hyperpriors for the site-rate distribution and uses LLM-derived priors for the hyperparameters, elicited with blind and disease-informed prompts from Llama 3.3 and MedGemma across temperature settings, and evaluates them on HRPC control-arm data with rigorous cross-validation and sample-efficiency analyses. Results show that LLM-informed priors consistently outperform traditional meta-analytical priors in predictive accuracy and can substantially reduce required sample sizes, enabling faster, safer, and more cost-efficient trial design. The work highlights practical implications for regulatory-era safety monitoring and provides a scalable methodology for expert-informed Bayesian priors in complex, multi-site clinical data.
Abstract
Imagine a world where clinical trials need far fewer patients to achieve the same statistical power, thanks to the knowledge encoded in large language models (LLMs). We present a novel framework for hierarchical Bayesian modeling of adverse events in multi-center clinical trials, leveraging LLM-informed prior distributions. Unlike data augmentation approaches that generate synthetic data points, our methodology directly obtains parametric priors from the model. Our approach systematically elicits informative priors for hyperparameters in hierarchical Bayesian models using a pre-trained LLM, enabling the incorporation of external clinical expertise directly into Bayesian safety modeling. Through comprehensive temperature sensitivity analysis and rigorous cross-validation on real-world clinical trial data, we demonstrate that LLM-derived priors consistently improve predictive performance compared to traditional meta-analytical approaches. This methodology paves the way for more efficient and expert-informed clinical trial design, enabling substantial reductions in the number of patients required to achieve robust safety assessment and with the potential to transform drug safety monitoring and regulatory decision making.
