Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

Zhiyao Ren; Yibing Zhan; Baosheng Yu; Liang Ding; Dacheng Tao

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, Dacheng Tao

TL;DR

This work introduces Healthcare Copilot, a modular framework that elevates general LLMs for online medical consultation without fine-tuning. It decomposes the system into Dialogue (task classification, safety, and doctor oversight), Memory (Conversation and History), and Processing (report generation), unified by a modular prompting approach. An auto-evaluation protocol using ChatGPT and the MedDialog dataset shows consistent improvements across inquiry capability, conversational fluency, accuracy, and safety, with GPT-4-based backbones delivering the strongest performance and ablations clarifying each module’s impact. The study highlights the potential and challenges of deploying open, non-finetuned LLMs in medical settings, including safety, ethics, and the need for clinical validation and open disclosure of technical details.

Abstract

The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community. In this paper, we introduce the construction of a Healthcare Copilot designed for medical consultation. The proposed Healthcare Copilot comprises three main components: 1) the Dialogue component, responsible for effective and safe patient interactions; 2) the Memory component, storing both current conversation data and historical patient information; and 3) the Processing component, summarizing the entire dialogue and generating reports. To evaluate the proposed Healthcare Copilot, we implement an auto-evaluation scheme using ChatGPT for two roles: as a virtual patient engaging in dialogue with the copilot, and as an evaluator to assess the quality of the dialogue. Extensive results demonstrate that the proposed Healthcare Copilot significantly enhances the capabilities of general LLMs for medical consultations in terms of inquiry capability, conversational fluency, response accuracy, and safety. Furthermore, we conduct ablation studies to highlight the contribution of each individual module in the Healthcare Copilot. Code will be made publicly available on GitHub.

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

TL;DR

Abstract

Paper Structure (71 sections, 51 figures, 3 tables)

This paper contains 71 sections, 51 figures, 3 tables.

Introduction
Related Work
LLMs for Medicine.
LLM Copilots.
Modular Prompting.
Healthcare Copilot
Overview
Dialogue
Function Module.
Safety Module.
Doctor Module.
Memory
Conversation Memory.
History Memory.
Processing
...and 56 more sections

Figures (51)

Figure 1: An illustration of the proposed Healthcare Copilot, which enhances general LLMs for medical consultation in terms of inquiry capability, conversational fluency, response accuracy, and safety.
Figure 2: The Healthcare Copilot framework contains three components: Dialogue, Memory, and Processing.
Figure 3: The influence of the Doctor module.
Figure 4: An example showing the impact of History Memory. For simplicity, we only show a portion of the doctor's dialogue. See more details in Appendix \ref{['sec:history_ablation']}.
Figure 5: An example of using a General LLM and Healthcare Copilot. The yellow parts provide explanations of the different modules in Healthcare Copilot.
...and 46 more figures

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

TL;DR

Abstract

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

Authors

TL;DR

Abstract

Table of Contents

Figures (51)