LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them
Wenya Xie, Qingying Xiao, Yu Zheng, Xidong Wang, Junying Chen, Ke Ji, Anningzhe Gao, Xiang Wan, Feng Jiang, Benyou Wang
TL;DR
The paper reframes AI in healthcare from autonomous patient consultations to doctor-centered assistance, arguing that clinicians must oversee AI outputs to ensure safety. It develops DoctorFLAN, a Chinese medical dataset of ~92K samples across 22 tasks and 27 specialties, plus DoctorFLAN-test and DotaBench for evaluation, and introduces DotaGPT trained on DoctorFLAN. Through automatic and human evaluations across a suite of baselines, it shows that doctor-oriented training substantially improves performance on clinically relevant tasks and brings performance close to GPT-4 in some settings. The work provides a practical framework and resources for integrating LLMs into clinical workflows while highlighting the need for careful deployment, task prioritization, and domain-specific benchmarks to bridge the gap between patient-facing models and doctor-assistant AI.
Abstract
The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning the LLMs to be medical assistants who collaborate with more experienced doctors. We first conduct a two-stage survey by inspiration-feedback to gain a broad understanding of the real needs of doctors for medical assistants. Based on this, we construct a Chinese medical dataset called DoctorFLAN to support the entire workflow of doctors, which includes 92K Q\&A samples from 22 tasks and 27 specialists. Moreover, we evaluate LLMs in doctor-oriented scenarios by constructing the DoctorFLAN-\textit{test} containing 550 single-turn Q\&A and DotaBench containing 74 multi-turn conversations. The evaluation results indicate that being a medical assistant still poses challenges for existing open-source models, but DoctorFLAN can help them significantly. It demonstrates that the doctor-oriented dataset and benchmarks we construct can complement existing patient-oriented work and better promote medical LLMs research.
