Table of Contents
Fetching ...

Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models

Chengyang He, Wenlong Zhang, Violet Xinying Chen, Yue Ning, Ping Wang

TL;DR

Faced with variability in unstructured clinical text, the paper tackles accurate symptom coding by mapping mentions to MedDRA in VAERS reports. It introduces Task as Context Prompting (TACO) to fuse extraction and linking within prompts, and SYMPCODER as a human-annotated benchmark. Through a two-stage LINK and MATCH evaluation across multiple LLMs, TACO demonstrates superior performance, particularly with GPT-4-Turbo and GPT-4o. The work advances clinical NLP by enabling tailored, scalable symptom coding and provides a practical dataset and evaluation framework for future research.

Abstract

Accurate medical symptom coding from unstructured clinical text, such as vaccine safety reports, is a critical task with applications in pharmacovigilance and safety monitoring. Symptom coding, as tailored in this study, involves identifying and linking nuanced symptom mentions to standardized vocabularies like MedDRA, differentiating it from broader medical coding tasks. Traditional approaches to this task, which treat symptom extraction and linking as independent workflows, often fail to handle the variability and complexity of clinical narratives, especially for rare cases. Recent advancements in Large Language Models (LLMs) offer new opportunities but face challenges in achieving consistent performance. To address these issues, we propose Task as Context (TACO) Prompting, a novel framework that unifies extraction and linking tasks by embedding task-specific context into LLM prompts. Our study also introduces SYMPCODER, a human-annotated dataset derived from Vaccine Adverse Event Reporting System (VAERS) reports, and a two-stage evaluation framework to comprehensively assess both symptom linking and mention fidelity. Our comprehensive evaluation of multiple LLMs, including Llama2-chat, Jackalope-7b, GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4o, demonstrates TACO's effectiveness in improving flexibility and accuracy for tailored tasks like symptom coding, paving the way for more specific coding tasks and advancing clinical text processing methodologies.

Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models

TL;DR

Faced with variability in unstructured clinical text, the paper tackles accurate symptom coding by mapping mentions to MedDRA in VAERS reports. It introduces Task as Context Prompting (TACO) to fuse extraction and linking within prompts, and SYMPCODER as a human-annotated benchmark. Through a two-stage LINK and MATCH evaluation across multiple LLMs, TACO demonstrates superior performance, particularly with GPT-4-Turbo and GPT-4o. The work advances clinical NLP by enabling tailored, scalable symptom coding and provides a practical dataset and evaluation framework for future research.

Abstract

Accurate medical symptom coding from unstructured clinical text, such as vaccine safety reports, is a critical task with applications in pharmacovigilance and safety monitoring. Symptom coding, as tailored in this study, involves identifying and linking nuanced symptom mentions to standardized vocabularies like MedDRA, differentiating it from broader medical coding tasks. Traditional approaches to this task, which treat symptom extraction and linking as independent workflows, often fail to handle the variability and complexity of clinical narratives, especially for rare cases. Recent advancements in Large Language Models (LLMs) offer new opportunities but face challenges in achieving consistent performance. To address these issues, we propose Task as Context (TACO) Prompting, a novel framework that unifies extraction and linking tasks by embedding task-specific context into LLM prompts. Our study also introduces SYMPCODER, a human-annotated dataset derived from Vaccine Adverse Event Reporting System (VAERS) reports, and a two-stage evaluation framework to comprehensively assess both symptom linking and mention fidelity. Our comprehensive evaluation of multiple LLMs, including Llama2-chat, Jackalope-7b, GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4o, demonstrates TACO's effectiveness in improving flexibility and accuracy for tailored tasks like symptom coding, paving the way for more specific coding tasks and advancing clinical text processing methodologies.

Paper Structure

This paper contains 29 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: SYMPCODER data creation and overall workflow of TACO prompting and evaluation. Key components in the framework include: (1) Source Input: Report text and a suggested symptom list; (2) TACO Prompting: Guides LLMs in symptom coding; (3) Model Output Distillation: Refines LLM outputs; (4) SYMPCODER Dataset: Contains human annotations; and (5) Two-Stage Evaluation: LINK for matching extracted symptoms with annotations, and MATCH for assessing the contextual accuracy of symptom mentions.
  • Figure 2: Distributions of the number of symptoms for different datasets in SYMPCODER.
  • Figure 3: The structure of TASI and TACO prompts, detailing clinical input, task instructions, and output format. The provided output examples are for format demonstration purposes only and do not align with the clinical input text, which is taken from real clinical reports.
  • Figure 4: Common Rare Case Analysis with TASI Prompting on SYMPCODER
  • Figure 5: Common Rare Case Analysis with TACO Prompting on SYMPCODER