BianCang: A Traditional Chinese Medicine Large Language Model
Sibo Wei, Xueping Peng, Yi-Fei Wang, Tao Shen, Jiasheng Si, Weiyu Zhang, Fa Zhu, Athanasios V. Vasilakos, Wenpeng Lu, Xiaoming Wu, Yinglong Wang
TL;DR
BianCang is a traditional Chinese medicine–focused LLM built via a two-stage training pipeline to address the domain gap between TCM and modern AI models. It first performs continual pre-training on a large, curated corpus of TCM and medical knowledge, then applies supervised fine-tuning with a ChP-TCM–based instruction set to align the model with practical diagnostic tasks. Evaluations across 11 test sets and 4 tasks show that BianCang consistently outperforms open-source baselines of similar scale and rivals larger models on key TCM and medical benchmarks, while also delivering strong subjective assessments of professionalism and safety. By open-sourcing the ChP-TCM data and the model, the work provides a valuable resource for advancing domain-specific LLMs in TCM and offers a practical framework for integrating traditional knowledge with modern AI assistance, with careful attention to ethics and safety.
Abstract
The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on https://github.com/QLU-NLP/BianCang.
