Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

Yaxin Fan; Feng Jiang; Benyou Wang; Peifeng Li; Haizhou Li

Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

Yaxin Fan, Feng Jiang, Benyou Wang, Peifeng Li, Haizhou Li

TL;DR

This paper introduces the Self-Diagnostic Atomic Knowledge (SDAK) benchmark to quantify how much self-diagnostic medical knowledge is memorized in Chinese medical LLMs. It uses thematic analysis of real user queries to define 17 atomic knowledge types and constructs 14,048 atomic knowledge items as factual/counterfactual pairs, evaluated via a contrastive memory test with automatic metrics (Instruction Following Rate and Factual Accuracy) and optional manual Accuracy Reliability. Empirical results show generic Chinese FMs generally outperform domain-tuned medical LLMs on SdAK, while distilled training data yields the largest gains in memorization of atomic knowledge; error analysis highlights sycophancy as a major challenge and confirms the value of data distillation over solely real-world data. The work provides a practical evaluation paradigm with open data and code to guide Chinese medical LLM development and highlights actionable directions for improving self-diagnostic knowledge in LLMs.

Abstract

Foundation Models (FMs) have the potential to revolutionize the way users self-diagnose through search engines by offering direct and efficient suggestions. Recent studies primarily focused on the quality of FMs evaluated by GPT-4 or their ability to pass medical exams, no studies have quantified the extent of self-diagnostic atomic knowledge stored in FMs' memory, which is the basis of foundation models to provide factual and reliable suggestions. In this paper, we first constructed a benchmark of Self-diagnostic Atomic Knowledge (SdAK), including the most common types of atomic knowledge involved in self-diagnostic queries, with 17 atomic types and a total of 14, 048 pieces of atomic knowledge. Then, we evaluated both generic and open-source Chinese medical FMs on the benchmark. The experimental results showcase that generic FMs perform better than medical FMs in terms of self-diagnostic atomic knowledge. Error analysis revealed that both generic and medical FMs are sycophantic, e.g., always catering to users' claims when it comes to unknown knowledge. We further explored different types of data commonly adopted for fine-tuning medical FMs, i.e., real-world, semi-distilled, and distilled data, and found that distilled data can benefit FMs most. The code and data are available at https://github.com/FreedomIntelligence/SDAK.

Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

TL;DR

Abstract

Paper Structure (31 sections, 6 figures, 11 tables)

This paper contains 31 sections, 6 figures, 11 tables.

Introduction
Challenges.
Solutions.
Results.
Findings.
Related Work
Medical Evaluation Methods
Fact-checking
Chinese Medical LLMs
Construction of Self-diagnostic Atomic Knowledge Benchmark
Thematic Analysis of Atomic Types
Construction of Atomic Knowledge Items
Manual Verification
Experiments
General and Medical LLMs for Evaluation
...and 16 more sections

Figures (6)

Figure 1: Widely used medical evaluation methods. The medical task mainly measures the ability of LLMs to complete the task, the medical examination explores the ability of LLMs to pass the examination, and the clinical diagnosis assesses the diagnosis ability of LLMs by using GPT-4 as the judgment.
Figure 2: Construction process of self-diagnostic atomic knowledge benchmark.
Figure 3: Process of the fact-checking style evaluation method.
Figure 4: Performance of representative LLMs on various types of atomic knowledge.
Figure 5: Performance of the LLM in the IFR and FactAcc metrics with different types of data.
...and 1 more figures

Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

TL;DR

Abstract

Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)