TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models
Ling Yue, Sixue Xing, Jintai Chen, Tianfan Fu
TL;DR
The paper introduces TrialEnroll, a Deep & Cross Network augmented with Large Language Model–generated features to predict clinical trial enrollment success, a previously underexplored AI task. By combining multimodal trial data (drug, disease, eligibility criteria, demographics, geography) with LLM-enhanced representations and a hierarchical attention-based deep component, the approach achieves a PR-AUC of 0.7002, outperforming traditional baselines. The work provides a curated dataset of 31,094 trials and demonstrates interpretability through criterion-level contributions, identifying the inclusion-criteria count and maximum age as key predictors. Practically, this advances proactive trial design and resource planning, enabling more efficient and reliable enrollment strategies, while outlining limitations and avenues for expanding data and features in future work.
Abstract
Clinical trials need to recruit a sufficient number of volunteer patients to demonstrate the statistical power of the treatment (e.g., a new drug) in curing a certain disease. Clinical trial recruitment has a significant impact on trial success. Forecasting whether the recruitment process would be successful before we run the trial would save many resources and time. This paper develops a novel deep & cross network with large language model (LLM)-augmented text feature that learns semantic information from trial eligibility criteria and predicts enrollment success. The proposed method enables interpretability by understanding which sentence/word in eligibility criteria contributes heavily to prediction. We also demonstrate the empirical superiority of the proposed method (0.7002 PR-AUC) over a bunch of well-established machine learning methods. The code and curated dataset are publicly available at https://anonymous.4open.science/r/TrialEnroll-7E12.
