PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models
Shashi Kant Gupta, Aditya Basu, Mauro Nievas, Jerrin Thomas, Nathan Wolfrath, Adhitya Ramamurthi, Bradley Taylor, Anai N. Kothari, Regina Schwind, Therica M. Miller, Sorena Nadaf-Rahrov, Yanshan Wang, Hrituraj Singh
TL;DR
PRISM tackles the real-world clinical trial matching problem by deploying a end-to-end pipeline that interprets unstructured EHR notes and trial criteria to rank eligible trials. It introduces a compositional QA framework with a scoring function $S=f\big(C(a_1,a_2,\ldots,a_j)\big)$ and, for each criterion, a probabilistic decision rule $\text{Criteria Met}=\begin{cases} \text{Yes}, & P(\text{criteria met}|\text{data})>0.66 \\ \text{No}, & P(\text{criteria met}|\text{data})<0.34 \\ \text{N/A}, & \text{otherwise} \end{cases}$, allowing robust handling of incomplete information. In extensive real-world evaluation, the OncoLLM 14B model achieves competitive criterion-level accuracy (63% overall, 66% after excluding N/As) and superior ranking performance (top-3 hits 65.3% and NDCG 0.68) compared to GPT-3.5-Turbo, while offering dramatic cost savings (~$170 vs ~$6,055 for GPT-4). The work demonstrates both patient-centric and trial-centric search capabilities, supports privacy-preserving private infrastructure deployment, and discusses practical considerations and future enhancements, such as integrating structured data and improving retrievers to further improve reliability and deployment readiness.
Abstract
Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs. Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials. We perform experiments with proprietary LLMs, including GPT-4 and GPT-3.5, as well as our custom fine-tuned model called OncoLLM and show that OncoLLM, despite its significantly smaller size, not only outperforms GPT-3.5 but also matches the performance of qualified medical doctors. All experiments were carried out on real-world EHRs that include clinical notes and available clinical trials from a single cancer center in the United States.
