Table of Contents
Fetching ...

ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation

Guangyu Wang, Guoxing Yang, Zongxin Du, Longjun Fan, Xiaohu Li

TL;DR

The paper presents ClinicalGPT, a large language model explicitly developed for clinical applications by training on diverse real-world medical data (including EHRs, QA, and multi-turn dialogues) and knowledge graphs. It combines supervised instruction tuning with a knowledge-graph–driven data schema and reinforcement learning from human feedback using PPO and KL regularization, aided by LoRA for efficiency. Across medical conversation, examinations, diagnosis, and QA, ClinicalGPT outperforms several baselines, demonstrating higher accuracy and more clinically aligned outputs. This work highlights the value of domain-specific data, knowledge grounding, and RLHF in deploying reliable AI tools for healthcare. Practically, it suggests a viable path toward more accurate, explainable, and safer clinical AI assistants.

Abstract

Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks, leveraging techniques such as the pre-training, and instruction fine-tuning. Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience. In this study, we present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios. By incorporating extensive and diverse real-world data, such as medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process, ClinicalGPT is better prepared to handle multiple clinical task. Furthermore, we introduce a comprehensive evaluation framework that includes medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Our results demonstrate that ClinicalGPT significantly outperforms other models in these tasks, highlighting the effectiveness of our approach in adapting large language models to the critical domain of healthcare.

ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation

TL;DR

The paper presents ClinicalGPT, a large language model explicitly developed for clinical applications by training on diverse real-world medical data (including EHRs, QA, and multi-turn dialogues) and knowledge graphs. It combines supervised instruction tuning with a knowledge-graph–driven data schema and reinforcement learning from human feedback using PPO and KL regularization, aided by LoRA for efficiency. Across medical conversation, examinations, diagnosis, and QA, ClinicalGPT outperforms several baselines, demonstrating higher accuracy and more clinically aligned outputs. This work highlights the value of domain-specific data, knowledge grounding, and RLHF in deploying reliable AI tools for healthcare. Practically, it suggests a viable path toward more accurate, explainable, and safer clinical AI assistants.

Abstract

Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks, leveraging techniques such as the pre-training, and instruction fine-tuning. Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience. In this study, we present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios. By incorporating extensive and diverse real-world data, such as medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process, ClinicalGPT is better prepared to handle multiple clinical task. Furthermore, we introduce a comprehensive evaluation framework that includes medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Our results demonstrate that ClinicalGPT significantly outperforms other models in these tasks, highlighting the effectiveness of our approach in adapting large language models to the critical domain of healthcare.
Paper Structure (15 sections, 1 figure, 8 tables)