Clinical trial cohort selection using Large Language Models on n2c2 Challenges

Chi-en Amy Tai; Xavier Tannier

Clinical trial cohort selection using Large Language Models on n2c2 Challenges

Chi-en Amy Tai, Xavier Tannier

TL;DR

This study addresses automating clinical trial cohort selection from unstructured clinical notes using large language models. It introduces a two-stage workflow that first selects the most promising LLM via iterated few-shot prompting on the n2c2-2018 dataset, then fine-tunes the chosen model on n2c2-2006 and 2008 datasets. The results show that LLMs can match or exceed baselines on straightforward eligibility criteria but struggle with fine-grained reasoning and imbalanced data; vicuna-13b and mistral-7b-instruct stand out for stability across 2018 tasks. The work highlights practical considerations for deploying LLM-assisted recruitment and outlines directions for domain-specific training and broader dataset generalization.

Abstract

Clinical trials are a critical process in the medical field for introducing new treatments and innovations. However, cohort selection for clinical trials is a time-consuming process that often requires manual review of patient text records for specific keywords. Though there have been studies on standardizing the information across the various platforms, Natural Language Processing (NLP) tools remain crucial for spotting eligibility criteria in textual reports. Recently, pre-trained large language models (LLMs) have gained popularity for various NLP tasks due to their ability to acquire a nuanced understanding of text. In this paper, we study the performance of large language models on clinical trial cohort selection and leverage the n2c2 challenges to benchmark their performance. Our results are promising with regard to the incorporation of LLMs for simple cohort selection tasks, but also highlight the difficulties encountered by these models as soon as fine-grained knowledge and reasoning are required.

Clinical trial cohort selection using Large Language Models on n2c2 Challenges

TL;DR

Abstract

Paper Structure (16 sections, 3 figures, 6 tables)

This paper contains 16 sections, 3 figures, 6 tables.

Introduction
Related Work
n2c2 Challenges
Prompting Techniques
Datasets
n2c2-2006 Challenge Dataset
n2c2-2008 Challenge Dataset
n2c2-2018 Challenge Dataset
Method
Stage 1: Selection of the Best LLM using n2c2-2018 Dataset
Stage 2: Fine-tuning of the Selected Stage 1 LLM using n2c2-2006 and n2c2-2008 Datasets
Results
Discussion
Lessons learned
Limitations
...and 1 more sections

Figures (3)

Figure 1: The n2c2-2008 dataset distribution for the fifteen textual criteria.
Figure 2: The n2c2-2008 dataset distribution for the fifteen intuitive criteria.
Figure 3: The n2c2-2018 dataset distribution for the thirteen criteria.

Clinical trial cohort selection using Large Language Models on n2c2 Challenges

TL;DR

Abstract

Clinical trial cohort selection using Large Language Models on n2c2 Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (3)