Table of Contents
Fetching ...

Towards Conditioning Clinical Text Generation for User Control

Osman Alperen Koraş, Rabi Bahnan, Jens Kleesiek, Amin Dada

TL;DR

This work addresses the reliability and controllability of clinical text generation by conditioning LLM outputs through two strategies: topic-level structured generation and authoring guidelines. It demonstrates that separating content from style, and guiding generation with structured topics and explicit guidelines, yields substantial gains on the BioNLP ACL'24 Discharge Me! benchmark, including state-of-the-art performance with efficient training and up to 34% gains through dataset augmentation. The authors validate their approach with automated metrics and preliminary human evaluations, indicating improvements in relevance, factual consistency, and clinical alignment, while maintaining manageable cognitive workload for clinicians. Overall, the study suggests that clinician-controlled, modular conditioning of LLMs can enhance the usefulness and safety of AI-assisted clinical documentation, with promising implications for real-world adoption and workflow integration.

Abstract

Deploying natural language generation systems in clinical settings remains challenging despite advances in Large Language Models (LLMs), which continue to exhibit hallucinations and factual inconsistencies, necessitating human oversight. This paper explores automated dataset augmentation using LLMs as human proxies to condition LLMs for clinician control without increasing cognitive workload. On the BioNLP ACL'24 Discharge Me! Shared Task, we achieve new state-of-the-art results with simpler methods than prior submissions through more efficient training, yielding a 9\% relative improvement without augmented training and up to 34\% with dataset augmentation. Preliminary human evaluation further supports the effectiveness of our approach, highlighting the potential of augmenting clinical text generation for control to enhance relevance, accuracy, and factual consistency.

Towards Conditioning Clinical Text Generation for User Control

TL;DR

This work addresses the reliability and controllability of clinical text generation by conditioning LLM outputs through two strategies: topic-level structured generation and authoring guidelines. It demonstrates that separating content from style, and guiding generation with structured topics and explicit guidelines, yields substantial gains on the BioNLP ACL'24 Discharge Me! benchmark, including state-of-the-art performance with efficient training and up to 34% gains through dataset augmentation. The authors validate their approach with automated metrics and preliminary human evaluations, indicating improvements in relevance, factual consistency, and clinical alignment, while maintaining manageable cognitive workload for clinicians. Overall, the study suggests that clinician-controlled, modular conditioning of LLMs can enhance the usefulness and safety of AI-assisted clinical documentation, with promising implications for real-world adoption and workflow integration.

Abstract

Deploying natural language generation systems in clinical settings remains challenging despite advances in Large Language Models (LLMs), which continue to exhibit hallucinations and factual inconsistencies, necessitating human oversight. This paper explores automated dataset augmentation using LLMs as human proxies to condition LLMs for clinician control without increasing cognitive workload. On the BioNLP ACL'24 Discharge Me! Shared Task, we achieve new state-of-the-art results with simpler methods than prior submissions through more efficient training, yielding a 9\% relative improvement without augmented training and up to 34\% with dataset augmentation. Preliminary human evaluation further supports the effectiveness of our approach, highlighting the potential of augmenting clinical text generation for control to enhance relevance, accuracy, and factual consistency.

Paper Structure

This paper contains 24 sections, 3 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: An interactive workflow showcasing topic-level generation control. The LLM is prompted once with the respective context to begin structured generation. After each element, generation is paused, enabling users to sequentially refine content by editing LLM-suggested topic headings, questions, and text blocks. The generation resumes with user-verified content.
  • Figure 2: Instruction-tuning pipeline. Dashed lines indicate paths that depend on the training configuration. Models with topic-level control are trained to generate XML-structured text. The extended context is provided only for TT = DI. Abbreviations: Discharge Summary (DS), Radiology Report (RR), Discharge Instructions (DI), Brief Hospital Course (BHC), Target Text (TT).
  • Figure 3: The generic template for $prompt_i(c,g)$ used for instruction-tuning.
  • Figure 4: Relative improvement of augmented models against the traditionally instruction-tuned BASE model (cf. Tab. \ref{['table:main_results']}).
  • Figure 5: The user prompt used for evaluation.