Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

Haochun Wang; Sendong Zhao; Zewen Qiang; Zijian Li; Nuwa Xi; Yanrui Du; MuZhen Cai; Haoqiang Guo; Yuhan Chen; Haoming Xu; Bing Qin; Ting Liu

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu

TL;DR

This work tackles medical hallucinations in Chinese LLMs by grounding responses in structured medical knowledge bases through a retrieval-guided knowledge-tuning framework. It builds the Chinese medical knowledge QA dataset cMedKnowQA from KBs and demonstrates that grounding via entity/attribute prediction, knowledge retrieval, and knowledge-consistent generation yields higher accuracy and reliability than vanilla instruction-tuning, approaching ChatGPT performance when knowledge is correctly retrieved. The approach activates a retrieval-augmented, end-to-end pipeline with losses $L_e$, $L_{attr}$, $L_r$, and $L_{it}$, and proves effectiveness in few-shot settings and generalization to unseen entities. Together, these results underscore a practical, scalable path for domain adaptation of Chinese LLMs in medical contexts, with explicit knowledge sources and safety considerations.

Abstract

Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited domain knowledge. Such shortcomings pose potential risks in the utilization of LLMs within medical contexts. To address this challenge, we propose knowledge-tuning, which leverages structured medical knowledge bases for the LLMs to grasp domain knowledge efficiently and facilitate reliable response generation. We also release cMedKnowQA, a Chinese medical knowledge question-answering dataset constructed from medical knowledge bases to assess the medical knowledge proficiency of LLMs. Experimental results show that the LLMs which are knowledge-tuned with cMedKnowQA, can exhibit higher levels of accuracy in response generation compared with vanilla instruction-tuning and offer a new reliable way for the domain adaptation of LLMs.

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

TL;DR

, and

, and proves effectiveness in few-shot settings and generalization to unseen entities. Together, these results underscore a practical, scalable path for domain adaptation of Chinese LLMs in medical contexts, with explicit knowledge sources and safety considerations.

Abstract

Paper Structure (27 sections, 6 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 6 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Large Language Models
LLMs in Biomedical Domain
Tools for LLMs
Methodology
Structured Medical Knowledge Bases
Knowledge-guided Data Generation
Knowledge-tuning
Experiment
Baselines and Implementations
Dataset
Metrics
Evaluation on the Medical Entity and Knowledge
Evaluation on the Response Quality
...and 12 more sections

Figures (5)

Figure 1: Response cases of ChatGPT with an identical question raised in English and Chinese. ChatGPT provides incorrect medicine recommendations in response to the question in Chinese. Generated by ChatGPT on April 13th, 2023.
Figure 2: Process for knowledge-based response generation. Stage 1: Fill in the parameters for the knowledge retrieval based on the query question. Stage 2: Acquire the knowledge with filled parameters. Stage 3: Generate a response with acquired knowledge. Texts in Chinese have been translated into English.
Figure 3: One medical knowledge-guided instance generated for knowledge-tuning. Texts in Chinese have been translated into English.
Figure 4: Entity generation in the few-shot scenarios.
Figure 5: Model generalization with unseen entities. X-axis indicates the partition of seen entities in the training set.

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

TL;DR

Abstract

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

Authors

TL;DR

Abstract

Table of Contents

Figures (5)