Table of Contents
Fetching ...

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

Tien Huu Do, Antoine Masquelier, Nae Eoun Lee, Jonathan Crowther

TL;DR

The paper tackles planning-phase clinical trial enrollment prediction with uncertainty estimates. It introduces a unified deep learning framework that encodes unstructured trial text using a Longformer-based text processor and fuses it with structured features through multi-head attention, augmented by a Gamma-distribution output for uncertainty. It further extends to a Deep Poisson-Gamma model to forecast trial duration under site-level randomness. Empirically, the deterministic and stochastic variants outperform strong baselines on a large multi-source dataset, with reliable interval estimates and substantially faster inference for duration prediction, offering scalable, uncertainty-aware forecasts to improve trial planning and resource allocation.

Abstract

Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of trial outcomes. Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase. In this work, we propose a novel deep learning-based method to address this critical challenge. Our method, implemented as a neural network model, leverages pre-trained language models (PLMs) to capture the complexities and nuances of clinical documents, transforming them into expressive representations. These representations are then combined with encoded tabular features via an attention mechanism. To account for uncertainties in enrollment prediction, we enhance the model with a probabilistic layer based on the Gamma distribution, which enables range estimation. We apply the proposed model to predict clinical trial duration, assuming site-level enrollment follows a Poisson-Gamma process. We carry out extensive experiments on real-world clinical trial data, and show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial, outperforming established baseline models.

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

TL;DR

The paper tackles planning-phase clinical trial enrollment prediction with uncertainty estimates. It introduces a unified deep learning framework that encodes unstructured trial text using a Longformer-based text processor and fuses it with structured features through multi-head attention, augmented by a Gamma-distribution output for uncertainty. It further extends to a Deep Poisson-Gamma model to forecast trial duration under site-level randomness. Empirically, the deterministic and stochastic variants outperform strong baselines on a large multi-source dataset, with reliable interval estimates and substantially faster inference for duration prediction, offering scalable, uncertainty-aware forecasts to improve trial planning and resource allocation.

Abstract

Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of trial outcomes. Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase. In this work, we propose a novel deep learning-based method to address this critical challenge. Our method, implemented as a neural network model, leverages pre-trained language models (PLMs) to capture the complexities and nuances of clinical documents, transforming them into expressive representations. These representations are then combined with encoded tabular features via an attention mechanism. To account for uncertainties in enrollment prediction, we enhance the model with a probabilistic layer based on the Gamma distribution, which enables range estimation. We apply the proposed model to predict clinical trial duration, assuming site-level enrollment follows a Poisson-Gamma process. We carry out extensive experiments on real-world clinical trial data, and show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial, outperforming established baseline models.

Paper Structure

This paper contains 18 sections, 18 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Deterministic model architecture for enrollment prediction. Two sets of attributes are Key and Context, where the latter is encoded into text embeddings using Clinical Longformer. $\mathcal{F}_{emb}, \mathcal{F}_{cat}$, and $\mathcal{F}_{num}$ are fully connected layers.
  • Figure 2: Examples of output of the stochastic model for two different studies. The dashed line represents the mean of the output distribution and the orange line represents the true total number of patients enrolled. The horizontal axis represents the number of patients enrolled and the vertical axis represents its probability density.
  • Figure 3: Stochastic model architecture for patient enrollment prediction. This model shares the architecture with the deterministic model except the last layer where two parameters of the Gamma distribution are predicted.
  • Figure 4: Relationships between interval width, confidence level, and interval accuracy of the stochastic model described in Section \ref{['sec:method:stochastic']}. Interval accuracy is the percentage of intervals that contain the actual value of number of patients.