Table of Contents
Fetching ...

BianCang: A Traditional Chinese Medicine Large Language Model

Sibo Wei, Xueping Peng, Yi-Fei Wang, Tao Shen, Jiasheng Si, Weiyu Zhang, Fa Zhu, Athanasios V. Vasilakos, Wenpeng Lu, Xiaoming Wu, Yinglong Wang

TL;DR

BianCang is a traditional Chinese medicine–focused LLM built via a two-stage training pipeline to address the domain gap between TCM and modern AI models. It first performs continual pre-training on a large, curated corpus of TCM and medical knowledge, then applies supervised fine-tuning with a ChP-TCM–based instruction set to align the model with practical diagnostic tasks. Evaluations across 11 test sets and 4 tasks show that BianCang consistently outperforms open-source baselines of similar scale and rivals larger models on key TCM and medical benchmarks, while also delivering strong subjective assessments of professionalism and safety. By open-sourcing the ChP-TCM data and the model, the work provides a valuable resource for advancing domain-specific LLMs in TCM and offers a practical framework for integrating traditional knowledge with modern AI assistance, with careful attention to ethics and safety.

Abstract

The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on https://github.com/QLU-NLP/BianCang.

BianCang: A Traditional Chinese Medicine Large Language Model

TL;DR

BianCang is a traditional Chinese medicine–focused LLM built via a two-stage training pipeline to address the domain gap between TCM and modern AI models. It first performs continual pre-training on a large, curated corpus of TCM and medical knowledge, then applies supervised fine-tuning with a ChP-TCM–based instruction set to align the model with practical diagnostic tasks. Evaluations across 11 test sets and 4 tasks show that BianCang consistently outperforms open-source baselines of similar scale and rivals larger models on key TCM and medical benchmarks, while also delivering strong subjective assessments of professionalism and safety. By open-sourcing the ChP-TCM data and the model, the work provides a valuable resource for advancing domain-specific LLMs in TCM and offers a practical framework for integrating traditional knowledge with modern AI assistance, with careful attention to ethics and safety.

Abstract

The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on https://github.com/QLU-NLP/BianCang.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Different diagnostic processes of TCM and modern medicine for the same medical record. Modern medicine relies on the chief complaint and auxiliary examination on the T-wave through electrocardiograms, diagnosing the patient based on specific value changes and trends. In contrast, TCM interprets the patient's chief complaint and diagnostic information from the four diagnostic methods within a unique Yin-Yang framework, identifying underlying causes and synthesizing the findings to determine the syndrome type. While modern medicine depends on quantifiable data, TCM is more abstract and experience-based.
  • Figure 2: The overall flowchart of constructing BianCang. In the first stage, extensive traditional Chinese medicine and medical knowledge is injected into the foundational model through continual pre-training. In the second stage, supervised fine-tuning is applied to activate and align the internal knowledge of the model.
  • Figure 3: Subjective evaluation results of BianCang-Qwen2.5-7B-Instruct and other baseline models in terms of professionalism, fluency, and safety. The test dataset used is BC-Analytical.