Table of Contents
Fetching ...

Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient's Point of View

Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi

TL;DR

The study tackles patient-initiated recruitment for clinical trials by introducing NLI4PR, a dataset that recasts patient-language profiles against trial eligibility as a natural language inference task. It derives 7007 instances from TREC-CT 2022, converting medical-language patient topics into lay-language statements and pairing them with CTR eligibility as premises, then evaluating open-source LLMs in zero-shot settings with two prompting styles. Results show models outperform majority baselines in both medical and patient-language regimes, with medical-language inputs yielding higher F1 scores, yet the best models remain effective with patient-language statements, suggesting feasibility of patient-driven recruitment. Limitations include potential precision loss in lay terms and the use of a single annotator for rephrasing; the work points to future fine-tuning, explainability studies, and expanded, diverse patient-language data to make patient-centric recruitment scalable and reliable.

Abstract

Recruiting patients to participate in clinical trials can be challenging and time-consuming. Usually, participation in a clinical trial is initiated by a healthcare professional and proposed to the patient. Promoting clinical trials directly to patients via online recruitment might help to reach them more efficiently. In this study, we address the case where a patient is initiating their own recruitment process and wants to determine whether they are eligible for a given clinical trial, using their own language to describe their medical profile. To study whether this creates difficulties in the patient trial matching process, we design a new dataset and task, Natural Language Inference for Patient Recruitment (NLI4PR), in which patient language profiles must be matched to clinical trials. We create it by adapting the TREC 2022 Clinical Trial Track dataset, which provides patients' medical profiles, and rephrasing them manually using patient language. We also use the associated clinical trial reports where the patients are either eligible or excluded. We prompt several open-source Large Language Models on our task and achieve from 56.5 to 71.8 of F1 score using patient language, against 64.7 to 73.1 for the same task using medical language. When using patient language, we observe only a small loss in performance for the best model, suggesting that having the patient as a starting point could be adopted to help recruit patients for clinical trials. The corpus and code bases are all freely available on our Github and HuggingFace repositories.

Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient's Point of View

TL;DR

The study tackles patient-initiated recruitment for clinical trials by introducing NLI4PR, a dataset that recasts patient-language profiles against trial eligibility as a natural language inference task. It derives 7007 instances from TREC-CT 2022, converting medical-language patient topics into lay-language statements and pairing them with CTR eligibility as premises, then evaluating open-source LLMs in zero-shot settings with two prompting styles. Results show models outperform majority baselines in both medical and patient-language regimes, with medical-language inputs yielding higher F1 scores, yet the best models remain effective with patient-language statements, suggesting feasibility of patient-driven recruitment. Limitations include potential precision loss in lay terms and the use of a single annotator for rephrasing; the work points to future fine-tuning, explainability studies, and expanded, diverse patient-language data to make patient-centric recruitment scalable and reliable.

Abstract

Recruiting patients to participate in clinical trials can be challenging and time-consuming. Usually, participation in a clinical trial is initiated by a healthcare professional and proposed to the patient. Promoting clinical trials directly to patients via online recruitment might help to reach them more efficiently. In this study, we address the case where a patient is initiating their own recruitment process and wants to determine whether they are eligible for a given clinical trial, using their own language to describe their medical profile. To study whether this creates difficulties in the patient trial matching process, we design a new dataset and task, Natural Language Inference for Patient Recruitment (NLI4PR), in which patient language profiles must be matched to clinical trials. We create it by adapting the TREC 2022 Clinical Trial Track dataset, which provides patients' medical profiles, and rephrasing them manually using patient language. We also use the associated clinical trial reports where the patients are either eligible or excluded. We prompt several open-source Large Language Models on our task and achieve from 56.5 to 71.8 of F1 score using patient language, against 64.7 to 73.1 for the same task using medical language. When using patient language, we observe only a small loss in performance for the best model, suggesting that having the patient as a starting point could be adopted to help recruit patients for clinical trials. The corpus and code bases are all freely available on our Github and HuggingFace repositories.

Paper Structure

This paper contains 31 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Example of a CTR's eligibility criteria. Taken from NCT04581941, available on clinicaltrials.gov
  • Figure 2: Corpus creation steps
  • Figure 3: Rephrasing of a patient topic, following MIMIC-IV categories and using MedlinePlus.
  • Figure 4: Lay is patient language, Med is medical doctor's language, V stands for vanilla prompt and P stands for persona prompt.
  • Figure 5: Eligibility criteria from trial NCT03160898, used as the premise.
  • ...and 2 more figures