Clinical trial cohort selection using Large Language Models on n2c2 Challenges
Chi-en Amy Tai, Xavier Tannier
TL;DR
This study addresses automating clinical trial cohort selection from unstructured clinical notes using large language models. It introduces a two-stage workflow that first selects the most promising LLM via iterated few-shot prompting on the n2c2-2018 dataset, then fine-tunes the chosen model on n2c2-2006 and 2008 datasets. The results show that LLMs can match or exceed baselines on straightforward eligibility criteria but struggle with fine-grained reasoning and imbalanced data; vicuna-13b and mistral-7b-instruct stand out for stability across 2018 tasks. The work highlights practical considerations for deploying LLM-assisted recruitment and outlines directions for domain-specific training and broader dataset generalization.
Abstract
Clinical trials are a critical process in the medical field for introducing new treatments and innovations. However, cohort selection for clinical trials is a time-consuming process that often requires manual review of patient text records for specific keywords. Though there have been studies on standardizing the information across the various platforms, Natural Language Processing (NLP) tools remain crucial for spotting eligibility criteria in textual reports. Recently, pre-trained large language models (LLMs) have gained popularity for various NLP tasks due to their ability to acquire a nuanced understanding of text. In this paper, we study the performance of large language models on clinical trial cohort selection and leverage the n2c2 challenges to benchmark their performance. Our results are promising with regard to the incorporation of LLMs for simple cohort selection tasks, but also highlight the difficulties encountered by these models as soon as fine-grained knowledge and reasoning are required.
