BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT
Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, Xiangmin Xu
TL;DR
<3-5 sentence high-level summary> The paper addresses the gap that health LLMs typically excel at single-turn suggestions but struggle with proactive, multi-turn questioning (CoQ) essential for personalized advice. It introduces BianQue, a CoQ-balanced health LLM finetuned on the large-scale BianQueCorpus, created by cleaning real-world conversations and polishing doctor responses with ChatGPT. The model is built on ChatGLM-6B and uses a structured, multi-turn input format to generate alternating questions and health suggestions. Across four multi-turn health-dialogue benchmarks, BianQue outperforms baselines in both questioning ability and suggestion quality, as measured by BLEU/ROUGE and a Proactive Questioning Ability (PQA) metric, highlighting its potential for proactive health conversations while noting safety and privacy considerations for future work.
Abstract
Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health.
