Table of Contents
Fetching ...

TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

Ling Yue, Sixue Xing, Jintai Chen, Tianfan Fu

TL;DR

The paper introduces TrialEnroll, a Deep & Cross Network augmented with Large Language Model–generated features to predict clinical trial enrollment success, a previously underexplored AI task. By combining multimodal trial data (drug, disease, eligibility criteria, demographics, geography) with LLM-enhanced representations and a hierarchical attention-based deep component, the approach achieves a PR-AUC of 0.7002, outperforming traditional baselines. The work provides a curated dataset of 31,094 trials and demonstrates interpretability through criterion-level contributions, identifying the inclusion-criteria count and maximum age as key predictors. Practically, this advances proactive trial design and resource planning, enabling more efficient and reliable enrollment strategies, while outlining limitations and avenues for expanding data and features in future work.

Abstract

Clinical trials need to recruit a sufficient number of volunteer patients to demonstrate the statistical power of the treatment (e.g., a new drug) in curing a certain disease. Clinical trial recruitment has a significant impact on trial success. Forecasting whether the recruitment process would be successful before we run the trial would save many resources and time. This paper develops a novel deep & cross network with large language model (LLM)-augmented text feature that learns semantic information from trial eligibility criteria and predicts enrollment success. The proposed method enables interpretability by understanding which sentence/word in eligibility criteria contributes heavily to prediction. We also demonstrate the empirical superiority of the proposed method (0.7002 PR-AUC) over a bunch of well-established machine learning methods. The code and curated dataset are publicly available at https://anonymous.4open.science/r/TrialEnroll-7E12.

TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

TL;DR

The paper introduces TrialEnroll, a Deep & Cross Network augmented with Large Language Model–generated features to predict clinical trial enrollment success, a previously underexplored AI task. By combining multimodal trial data (drug, disease, eligibility criteria, demographics, geography) with LLM-enhanced representations and a hierarchical attention-based deep component, the approach achieves a PR-AUC of 0.7002, outperforming traditional baselines. The work provides a curated dataset of 31,094 trials and demonstrates interpretability through criterion-level contributions, identifying the inclusion-criteria count and maximum age as key predictors. Practically, this advances proactive trial design and resource planning, enabling more efficient and reliable enrollment strategies, while outlining limitations and avenues for expanding data and features in future work.

Abstract

Clinical trials need to recruit a sufficient number of volunteer patients to demonstrate the statistical power of the treatment (e.g., a new drug) in curing a certain disease. Clinical trial recruitment has a significant impact on trial success. Forecasting whether the recruitment process would be successful before we run the trial would save many resources and time. This paper develops a novel deep & cross network with large language model (LLM)-augmented text feature that learns semantic information from trial eligibility criteria and predicts enrollment success. The proposed method enables interpretability by understanding which sentence/word in eligibility criteria contributes heavily to prediction. We also demonstrate the empirical superiority of the proposed method (0.7002 PR-AUC) over a bunch of well-established machine learning methods. The code and curated dataset are publicly available at https://anonymous.4open.science/r/TrialEnroll-7E12.
Paper Structure (43 sections, 9 equations, 4 figures, 9 tables)

This paper contains 43 sections, 9 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Overview of TrialEnroll. Our model takes multimodal clinical trial features (e.g., drug, disease, eligibility criteria, geographical location of the trial, age, and target gender) as the input (detailed in Section \ref{['sec:feature_engineering']}), augmented by large language model (Section \ref{['sec:llm_feature_enhancement']}), leverages deep & cross network (DCN) as neural architecture (Section \ref{['sec:dcn_feature_learning']}), and predicts whether the trial enrollment will succeed.
  • Figure 2: The Deep & Cross Network.
  • Figure 3: Distribution of geographical locations of clinical trial records (country-level).
  • Figure 4: Ablation study on feature. Permutation importance of features using PR-AUC.

Theorems & Definitions (5)

  • definition 1: Drug Set
  • definition 2: Target Disease Set
  • definition 3: Trial Eligibility Criteria
  • definition 4: Clinical Trial Categorical/Numerical Feature
  • definition 5: Clinical Trial Enrollment Success