Table of Contents
Fetching ...

Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

Xiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting Zhang

TL;DR

Medical dialogue systems are increasingly powerful but face clinical safety and deployment gaps. The paper provides a technically rigorous survey that separates pre-LLM and LLM-based methods, catalogs evaluation metrics, and enumerates fourteen benchmarks. It identifies grand challenges including hallucination, numerical data handling, and medical specialization, proposing solutions such as retrieval-augmented generation and multi-agent collaboration. The work offers a foundation for researchers to compare methods, reuse resources, and guide future development toward safer, more effective clinical dialogue tools.

Abstract

This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dialogue systems remain limited and underspecified, hindering the further improvement of this area. To fill this gap, we investigate an initial pool of 325 papers from well-known computer science, and natural language processing conferences and journals, and make an overview. Recently, large language models have shown strong model capacity on downstream tasks, which also reshaped medical dialog systems' foundation. Despite the alluring practical application value, current medical dialogue systems still suffer from problems. To this end, this paper lists the grand challenges of medical dialog systems, especially of large language models.

Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

TL;DR

Medical dialogue systems are increasingly powerful but face clinical safety and deployment gaps. The paper provides a technically rigorous survey that separates pre-LLM and LLM-based methods, catalogs evaluation metrics, and enumerates fourteen benchmarks. It identifies grand challenges including hallucination, numerical data handling, and medical specialization, proposing solutions such as retrieval-augmented generation and multi-agent collaboration. The work offers a foundation for researchers to compare methods, reuse resources, and guide future development toward safer, more effective clinical dialogue tools.

Abstract

This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dialogue systems remain limited and underspecified, hindering the further improvement of this area. To fill this gap, we investigate an initial pool of 325 papers from well-known computer science, and natural language processing conferences and journals, and make an overview. Recently, large language models have shown strong model capacity on downstream tasks, which also reshaped medical dialog systems' foundation. Despite the alluring practical application value, current medical dialogue systems still suffer from problems. To this end, this paper lists the grand challenges of medical dialog systems, especially of large language models.
Paper Structure (23 sections, 2 figures, 8 tables)

This paper contains 23 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: The main content flow and categorization of this survey.
  • Figure 2: Two dialogues by patient-doctor and patient-chatgpt.