Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

Ran Xu; Hejie Cui; Yue Yu; Xuan Kan; Wenqi Shi; Yuchen Zhuang; Wei Jin; Joyce Ho; Carl Yang

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei Jin, Joyce Ho, Carl Yang

TL;DR

This work proposes an innovative, resource-efficient approach to synthetic clinical text generation using LLMs for clinical NLP tasks, ClinGen, which infuses knowledge into the process and consistently enhances performance across various tasks.

Abstract

Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation using LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and significantly enriching the diversity of generated training instances. Our code is available at \url{https://github.com/ritaranx/ClinGen}.

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

TL;DR

Abstract

Paper Structure (33 sections, 5 equations, 13 figures, 13 tables)

This paper contains 33 sections, 5 equations, 13 figures, 13 tables.

Introduction
Related Work
Preliminary Study
Problem Setup
Limitations of Existing Methods
Knowledge Infused Data Generation
Clinical knowledge extraction
Clinical Topics Generation
Clinical Writing Styles Suggestion
Knowledge-infused Data Generation
Language Model Fine-tuning
Empirical Evaluation
Experiment Setup
Model Performance with Synthetic Data
Ablation and Parameter Studies
...and 18 more sections

Figures (13)

Figure 1: Preliminary Studies. (c) is from BC5CDR-Disease and is in log scale.
Figure 2: The overview of ClinGen.
Figure 3: Different generators at Base.
Figure 4: Different proportion of data at Base.
Figure 5: Data distribution and diversity measures on ClinGen. (a) is from BC5CDR-Disease and (b) is from MEDIQA-RQE using ClinGen with LLM.
...and 8 more figures

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

TL;DR

Abstract

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (13)