Table of Contents
Fetching ...

Panacea: A foundation model for clinical trial search, summarization, design, and recruitment

Jiacheng Lin, Hanwen Xu, Zifeng Wang, Sheng Wang, Jimeng Sun

TL;DR

The approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development.

Abstract

Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challenge, we propose a clinical trial foundation model named Panacea, designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers, to infuse clinical knowledge into the model by pre-training. We further curate TrialInstruct, which has 200,866 of instruction data for fine-tuning. These resources enable Panacea to be widely applicable for a range of clinical trial tasks based on user requirements. We evaluated Panacea on a new benchmark, named TrialPanorama, which covers eight clinical trial tasks. Our method performed the best on seven of the eight tasks compared to six cutting-edge generic or medicine-specific LLMs. Specifically, Panacea showed great potential to collaborate with human experts in crafting the design of eligibility criteria, study arms, and outcome measures, in multi-round conversations. In addition, Panacea achieved 14.42% improvement in patient-trial matching, 41.78% to 52.02% improvement in trial search, and consistently ranked at the top for five aspects of trial summarization. Our approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development.

Panacea: A foundation model for clinical trial search, summarization, design, and recruitment

TL;DR

The approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development.

Abstract

Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challenge, we propose a clinical trial foundation model named Panacea, designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers, to infuse clinical knowledge into the model by pre-training. We further curate TrialInstruct, which has 200,866 of instruction data for fine-tuning. These resources enable Panacea to be widely applicable for a range of clinical trial tasks based on user requirements. We evaluated Panacea on a new benchmark, named TrialPanorama, which covers eight clinical trial tasks. Our method performed the best on seven of the eight tasks compared to six cutting-edge generic or medicine-specific LLMs. Specifically, Panacea showed great potential to collaborate with human experts in crafting the design of eligibility criteria, study arms, and outcome measures, in multi-round conversations. In addition, Panacea achieved 14.42% improvement in patient-trial matching, 41.78% to 52.02% improvement in trial search, and consistently ranked at the top for five aspects of trial summarization. Our approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development.
Paper Structure (18 sections, 1 equation, 17 figures, 2 tables)

This paper contains 18 sections, 1 equation, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Overview of Panacea.a, Number of de-identified trial documents in each ICD-10 category. The top 100 conditions with the most number of trial documents are illustrated here. b, Bar plot showing the most frequent diseases in clinical trial publications according to the MeSH terms. c, Bar plot showing the number of instruction data points per clinical trial task in TrialInstruct. d, An example of an instruction data point in TrialInstruct. e, Panacea first uses TrialAlign to fine-tune Mistral, then uses TrialInstruct for instruction tuning. We create TrialPanorama benchmark to evaluate Panacea and other LLMs on trial tasks.
  • Figure 1: Prompt for evaluation metrics on single-trial summarization.
  • Figure 2: Evaluation on trial search.a, Query generation aims to convert free text user input into a structured query that contains five categories: disease, intervention, phase, status, and study type. b, Query expansion aims to expand a set of keywords. Candidate keywords are not provided. c, Comparison of query generation in five specific categories in terms of Jaccard index. d, Comparison of query generation in terms of Jaccard index. e, Comparison of query expansion in terms of Jaccard index.
  • Figure 2: Prompt for evaluation metrics on multi-trial summarization.
  • Figure 3: Evaluating Panacea on trial summarization.a,b, Trial summarization aims to provide a concise summary, including trial goal and conclusion, for a single trial (a) or multiple trials (b). c,d, Evaluation on single-trial summarization (c) and multiple-trial summarization (d) by using Claude-based metric and trial search-based metrics. e, Comparison on trial search in terms of ROUGE. f, A case study illustrating how Panacea successfully summarize multiple studies.
  • ...and 12 more figures