Table of Contents
Fetching ...

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang

TL;DR

ChiMed-GPT presents a full training regime (pre-training, supervised fine-tuning, RLHF) for a Chinese medical LLM, built on Ziya-13B-v2 with a $4{,}096$-token context. Trained on the Chinese Medical Dataset (CMD) plus diagnostic dialogue data and safety prompts, it demonstrates superior performance on information extraction, question answering, and multi-turn dialogues compared with general- and medical-domain baselines. A bias analysis using CAMI and MICA scales shows reduced bias, highlighting safer content generation in medical contexts. The work provides an open-source, domain-aligned LLM with practical implications for medical online platforms and responsible AI in healthcare.

Abstract

Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

TL;DR

ChiMed-GPT presents a full training regime (pre-training, supervised fine-tuning, RLHF) for a Chinese medical LLM, built on Ziya-13B-v2 with a -token context. Trained on the Chinese Medical Dataset (CMD) plus diagnostic dialogue data and safety prompts, it demonstrates superior performance on information extraction, question answering, and multi-turn dialogues compared with general- and medical-domain baselines. A bias analysis using CAMI and MICA scales shows reduced bias, highlighting safer content generation in medical contexts. The work provides an open-source, domain-aligned LLM with practical implications for medical online platforms and responsible AI in healthcare.

Abstract

Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.
Paper Structure (12 sections, 3 figures, 16 tables)

This paper contains 12 sections, 3 figures, 16 tables.

Figures (3)

  • Figure 1: An illustration of the overall training process of the ChiMed-GPT, which consists of three stages including pre-training, supervised fine-tuning, and reinforcement learning from human feedback (RLHF).
  • Figure 2: Average bias scores of different LLMs on CAMI and MICA scales, where higher scores indicate more severe bias. The ranges for scale scores are also illustrated below the scale name for better illustration.
  • Figure 3: Accuracy curves of training the reward model on the validation set against training steps.