Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

Zhaocheng Liu; Quan Tu; Wen Ye; Yu Xiao; Zhishou Zhang; Hengfu Cui; Yalun Zhu; Qiang Ju; Shizheng Li; Jian Xie

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

Zhaocheng Liu, Quan Tu, Wen Ye, Yu Xiao, Zhishou Zhang, Hengfu Cui, Yalun Zhu, Qiang Ju, Shizheng Li, Jian Xie

TL;DR

The paper tackles the underexplored link between patient inquiry and diagnosis in online medical consultation by introducing a patient simulator guided by real doctor-patient dialogue strategies. It builds a data-driven pipeline using MedDialog and Chinese medical records, expands dialogue strategy tags with GPT-4o, and trains a LoRA-tuned Chinese-language simulator via in-context learning and supervised fine-tuning. The simulator achieves substantially lower hallucination rates and higher anthropomorphism than baselines, enabling more realistic evaluation and generation of synthetic data. Experiments demonstrate Liebig's law in the inquiry-diagnosis relationship and categorize inquiries into four types, providing actionable insights for optimizing inquiry allocation within 3–5 rounds of interaction.

Abstract

Recently, large language models have shown great potential to transform online medical consultation. Despite this, most research targets improving diagnostic accuracy with ample information, often overlooking the inquiry phase. Some studies try to evaluate or refine doctor models by using prompt-engineered patient agents. However, prompt engineering alone falls short in accurately simulating real patients. We need to explore new paradigms for patient simulation. Furthermore, the relationship between inquiry and diagnosis remains unexplored. This paper extracts dialogue strategies from real doctor-patient conversations to guide the training of a patient simulator. Our simulator shows higher anthropomorphism and lower hallucination rates, using dynamic dialogue strategies. This innovation offers a more accurate evaluation of diagnostic models and generates realistic synthetic data. We conduct extensive experiments on the relationship between inquiry and diagnosis, showing they adhere to Liebig's law: poor inquiry limits diagnosis effectiveness, regardless of diagnostic skill, and vice versa. The experiments also reveal substantial differences in inquiry performance among models. To delve into this phenomenon, the inquiry process is categorized into four distinct types. Analyzing the distribution of inquiries across these types helps explain the performance differences. The weights of our patient simulator are available https://github.com/PatientSimulator/PatientSimulator.

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 9 figures, 2 tables)

This paper contains 20 sections, 1 equation, 9 figures, 2 tables.

Introduction
Patient Simulator
Methods
Evaluation Results
Relationship Between Inquiry and Diagnosis: Impact on Diagnostic Accuracy
Experimental Setup
Experimental Results
Inquiry Differences Among Models
Four Types of Inquiry
Experimental Results
Related Works
Large Language Models in Medicine
The Evaluation of Language Models in Medicine
Conclusion
Candidate Set of Dialogue Strategy Tags
...and 5 more sections

Figures (9)

Figure 1: Our patient simulator (right) is compared to the baseline simulator (prompt engineering with GPT-4o, left) using identical patient records and doctor model.
Figure 2: Prompts for synthesizing patient simulator training dialogues.
Figure 3: Patients consistently use our patient simulator, and doctors initially employ different models to interact with the simulator for fixed n rounds (x-axis, n values are 1, 2, 3, 4, 5) to generate inquiry records. These records are then diagnosed using different doctor models, and the diagnostic accuracy (y-axis) is calculated. Each experiment is conducted three times, and the average accuracy is reported.
Figure 4: Examples of four types of inquiry with D representing the doctor and P representing the patient in the figure.
Figure 5: The comparison focuses on the distribution of four inquiry types across GPT-4o, GPT-4o-mini, and Claude-3-5-sonnet as inquiry models, segmented by inquiry rounds. The x-axis represents the inquiry models, while the y-axis indicates the proportion of the four inquiry types.
...and 4 more figures

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

TL;DR

Abstract

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

Authors

TL;DR

Abstract

Table of Contents

Figures (9)