Table of Contents
Fetching ...

Using Large Language Models to Generate Clinical Trial Tables and Figures

Yumeng Yang, Peter Krusche, Kristyn Pantoja, Cheng Shi, Ethan Ludmir, Kirk Roberts, Gen Zhu

TL;DR

The paper addresses automating the extraction and classification of cancer trial eligibility criteria from ClinicalTrials.gov by building seven domain-specific BERT-based classifiers and a ClinicalTrials.gov-pretrained model. It employs keyword-based sentence extraction, a rigorous annotation protocol, and thorough evaluation at both sentence- and trial-level using cross-validation. The study contributes a sizable annotated corpus (764 trials), a keyword-refinement workflow, and a comprehensive model comparison showing that the pretrained model often matches or surpasses domain models, with autoimmune criteria achieving perfect F1 and HCV remaining challenging. The findings demonstrate feasibility and potential scalability of automated eligibility-criteria tooling for oncology trials and lay groundwork for extending to additional criteria and protocol sources.

Abstract

Tables, figures, and listings (TFLs) are essential tools for summarizing clinical trial data. Creation of TFLs for reporting activities is often a time-consuming task encountered routinely during the execution of clinical trials. This study explored the use of large language models (LLMs) to automate the generation of TFLs through prompt engineering and few-shot transfer learning. Using public clinical trial data in ADaM format, our results demonstrated that LLMs can efficiently generate TFLs with prompt instructions, showcasing their potential in this domain. Furthermore, we developed a conservational agent named Clinical Trial TFL Generation Agent: An app that matches user queries to predefined prompts that produce customized programs to generate specific predefined TFLs.

Using Large Language Models to Generate Clinical Trial Tables and Figures

TL;DR

The paper addresses automating the extraction and classification of cancer trial eligibility criteria from ClinicalTrials.gov by building seven domain-specific BERT-based classifiers and a ClinicalTrials.gov-pretrained model. It employs keyword-based sentence extraction, a rigorous annotation protocol, and thorough evaluation at both sentence- and trial-level using cross-validation. The study contributes a sizable annotated corpus (764 trials), a keyword-refinement workflow, and a comprehensive model comparison showing that the pretrained model often matches or surpasses domain models, with autoimmune criteria achieving perfect F1 and HCV remaining challenging. The findings demonstrate feasibility and potential scalability of automated eligibility-criteria tooling for oncology trials and lay groundwork for extending to additional criteria and protocol sources.

Abstract

Tables, figures, and listings (TFLs) are essential tools for summarizing clinical trial data. Creation of TFLs for reporting activities is often a time-consuming task encountered routinely during the execution of clinical trials. This study explored the use of large language models (LLMs) to automate the generation of TFLs through prompt engineering and few-shot transfer learning. Using public clinical trial data in ADaM format, our results demonstrated that LLMs can efficiently generate TFLs with prompt instructions, showcasing their potential in this domain. Furthermore, we developed a conservational agent named Clinical Trial TFL Generation Agent: An app that matches user queries to predefined prompts that produce customized programs to generate specific predefined TFLs.
Paper Structure (10 sections, 1 figure, 4 tables)

This paper contains 10 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Table caption