CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation
Jing Chen, Zhihua Wei, Wei Zhang, Yingying Hu, Qiong Zhang
TL;DR
CliniChat tackles the scarcity of high-quality clinical interview data and the lack of standardized evaluation methods by introducing a two-module framework that reconstructs interview dialogues from notes (Clini-Recon) and evaluates them with expert-like metrics (Clini-Eval). It leverages multi-source knowledge, including patient guidelines, LLM capabilities, and physician input, to produce realistic, empathetic interview dialogues, exemplified by MedQA-Dialog (10,263 dialogues) and the specialized CliniChatGLM model. Experimental results show substantial gains in history-taking and overall interview quality, with CliniChatGLM achieving state-of-the-art performance on history-taking benchmarks relative to several baselines. The framework provides an end-to-end, cost-efficient path for advancing LLM-assisted clinical interviews while highlighting privacy considerations and potential risks requiring careful human oversight.
Abstract
Large language models (LLMs) hold great promise for assisting clinical interviews due to their fluent interactive capabilities and extensive medical knowledge. However, the lack of high-quality interview dialogue data and widely accepted evaluation methods has significantly impeded this process. So we propose CliniChat, a framework that integrates multi-source knowledge to enable LLMs to simulate real-world clinical interviews. It consists of two modules: Clini-Recon and Clini-Eval, each responsible for reconstructing and evaluating interview dialogues, respectively. By incorporating three sources of knowledge, Clini-Recon transforms clinical notes into systematic, professional, and empathetic interview dialogues. Clini-Eval combines a comprehensive evaluation metric system with a two-phase automatic evaluation approach, enabling LLMs to assess interview performance like experts. We contribute MedQA-Dialog, a high-quality synthetic interview dialogue dataset, and CliniChatGLM, a model specialized for clinical interviews. Experimental results demonstrate that CliniChatGLM's interview capabilities undergo a comprehensive upgrade, particularly in history-taking, achieving state-of-the-art performance.
