Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation
Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, Dacheng Tao
TL;DR
This work introduces Healthcare Copilot, a modular framework that elevates general LLMs for online medical consultation without fine-tuning. It decomposes the system into Dialogue (task classification, safety, and doctor oversight), Memory (Conversation and History), and Processing (report generation), unified by a modular prompting approach. An auto-evaluation protocol using ChatGPT and the MedDialog dataset shows consistent improvements across inquiry capability, conversational fluency, accuracy, and safety, with GPT-4-based backbones delivering the strongest performance and ablations clarifying each module’s impact. The study highlights the potential and challenges of deploying open, non-finetuned LLMs in medical settings, including safety, ethics, and the need for clinical validation and open disclosure of technical details.
Abstract
The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community. In this paper, we introduce the construction of a Healthcare Copilot designed for medical consultation. The proposed Healthcare Copilot comprises three main components: 1) the Dialogue component, responsible for effective and safe patient interactions; 2) the Memory component, storing both current conversation data and historical patient information; and 3) the Processing component, summarizing the entire dialogue and generating reports. To evaluate the proposed Healthcare Copilot, we implement an auto-evaluation scheme using ChatGPT for two roles: as a virtual patient engaging in dialogue with the copilot, and as an evaluator to assess the quality of the dialogue. Extensive results demonstrate that the proposed Healthcare Copilot significantly enhances the capabilities of general LLMs for medical consultations in terms of inquiry capability, conversational fluency, response accuracy, and safety. Furthermore, we conduct ablation studies to highlight the contribution of each individual module in the Healthcare Copilot. Code will be made publicly available on GitHub.
