Table of Contents
Fetching ...

Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM

Trisha Das, Dina Albassam, Jimeng Sun

TL;DR

This work tackles the privacy and data-scarcity challenges in training medical dialogue systems by generating synthetic patient–physician dialogues from public clinical notes. It introduces SynDial, a novel single-LLM pipeline that uses zero-shot prompting and a feedback loop to optimize dialogues for extractiveness and factuality, guided by a weighted combined score and thresholding with up to three refinement iterations. Evaluations on MIMIC-IV and MTS-Dialogue show SynDial outperforms baselines in extractiveness and factuality, maintains competitive diversity, and offers substantial cost savings compared to multi-LLM approaches like NoteChat; extrinsic tests also demonstrate beneficial downstream effects when augmenting training data. Overall, SynDial provides a scalable, privacy-preserving approach to producing high-quality synthetic dialogue data for training MDS, with strong potential to reduce reliance on real patient conversations while enabling broader language coverage and dataset expansion.

Abstract

Medical dialogue systems (MDS) enhance patient-physician communication, improve healthcare accessibility, and reduce costs. However, acquiring suitable data to train these systems poses significant challenges. Privacy concerns prevent the use of real conversations, necessitating synthetic alternatives. Synthetic dialogue generation from publicly available clinical notes offers a promising solution to this issue, providing realistic data while safeguarding privacy. Our approach, SynDial, uses a single LLM iteratively with zero-shot prompting and a feedback loop to generate and refine high-quality synthetic dialogues. The feedback consists of weighted evaluation scores for similarity and extractiveness. The iterative process ensures dialogues meet predefined thresholds, achieving superior extractiveness as a result of the feedback loop. Additionally, evaluation shows that the generated dialogues excel in factuality metric compared to the baselines and has comparable diversity scores with GPT4.

Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM

TL;DR

This work tackles the privacy and data-scarcity challenges in training medical dialogue systems by generating synthetic patient–physician dialogues from public clinical notes. It introduces SynDial, a novel single-LLM pipeline that uses zero-shot prompting and a feedback loop to optimize dialogues for extractiveness and factuality, guided by a weighted combined score and thresholding with up to three refinement iterations. Evaluations on MIMIC-IV and MTS-Dialogue show SynDial outperforms baselines in extractiveness and factuality, maintains competitive diversity, and offers substantial cost savings compared to multi-LLM approaches like NoteChat; extrinsic tests also demonstrate beneficial downstream effects when augmenting training data. Overall, SynDial provides a scalable, privacy-preserving approach to producing high-quality synthetic dialogue data for training MDS, with strong potential to reduce reliance on real patient conversations while enabling broader language coverage and dataset expansion.

Abstract

Medical dialogue systems (MDS) enhance patient-physician communication, improve healthcare accessibility, and reduce costs. However, acquiring suitable data to train these systems poses significant challenges. Privacy concerns prevent the use of real conversations, necessitating synthetic alternatives. Synthetic dialogue generation from publicly available clinical notes offers a promising solution to this issue, providing realistic data while safeguarding privacy. Our approach, SynDial, uses a single LLM iteratively with zero-shot prompting and a feedback loop to generate and refine high-quality synthetic dialogues. The feedback consists of weighted evaluation scores for similarity and extractiveness. The iterative process ensures dialogues meet predefined thresholds, achieving superior extractiveness as a result of the feedback loop. Additionally, evaluation shows that the generated dialogues excel in factuality metric compared to the baselines and has comparable diversity scores with GPT4.
Paper Structure (26 sections, 2 equations, 5 figures, 5 tables)

This paper contains 26 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: SynDial pipeline
  • Figure 2: Cost comparison between NoteChat and SynDial
  • Figure 3: Including vs not including previous visit's dialog and score in prompt.
  • Figure 4: Improvement in Extractiveness Scores on MTS-Dialogue Dataset
  • Figure 5: Improvement in Extractiveness Scores on MTS-Dialogue Dataset