Traceable Cross-Source RAG for Chinese Tibetan Medicine Question Answering

Fengxian Chen; Zhilong Tao; Jiaxuan Li; Yunlong Li; Qingguo Zhou

Traceable Cross-Source RAG for Chinese Tibetan Medicine Question Answering

Fengxian Chen, Zhilong Tao, Jiaxuan Li, Yunlong Li, Qingguo Zhou

TL;DR

The paper tackles the challenge of grounded question answering for Chinese Tibetan medicine across partitioned, heterogeneous knowledge bases (encyclopedia, classics, clinical papers) where density bias and provenance are critical. It introduces two complementary components: DAKS routing with budgeted retrieval to balance source authority and reduce density-driven bias, and an alignment-graph guided fusion approach to improve cross-KB verification and evidence packing under a token budget. The authors formalize the problem, define metrics including CrossEv@5, and evaluate on a 500-query TM QA benchmark using a lightweight generator, reporting improvements in routing quality and cross-KB evidence coverage while maintaining faithfulness and citation correctness. The results demonstrate that the full system achieves the best end-to-end cross-KB evidence coverage, offering a practical path toward deployable, traceable TM QA systems.

Abstract

Retrieval-augmented generation (RAG) promises grounded question answering, yet domain settings with multiple heterogeneous knowledge bases (KBs) remain challenging. In Chinese Tibetan medicine, encyclopedia entries are often dense and easy to match, which can dominate retrieval even when classics or clinical papers provide more authoritative evidence. We study a practical setting with three KBs (encyclopedia, classics, and clinical papers) and a 500-query benchmark (cutoff $K{=}5$) covering both single-KB and cross-KB questions. We propose two complementary methods to improve traceability, reduce hallucinations, and enable cross-KB verification. First, DAKS performs KB routing and budgeted retrieval to mitigate density-driven bias and to prioritize authoritative sources when appropriate. Second, we use an alignment graph to guide evidence fusion and coverage-aware packing, improving cross-KB evidence coverage without relying on naive concatenation. All answers are generated by a lightweight generator, \textsc{openPangu-Embedded-7B}. Experiments show consistent gains in routing quality and cross-KB evidence coverage, with the full system achieving the best CrossEv@5 while maintaining strong faithfulness and citation correctness.

Traceable Cross-Source RAG for Chinese Tibetan Medicine Question Answering

TL;DR

Abstract

) covering both single-KB and cross-KB questions. We propose two complementary methods to improve traceability, reduce hallucinations, and enable cross-KB verification. First, DAKS performs KB routing and budgeted retrieval to mitigate density-driven bias and to prioritize authoritative sources when appropriate. Second, we use an alignment graph to guide evidence fusion and coverage-aware packing, improving cross-KB evidence coverage without relying on naive concatenation. All answers are generated by a lightweight generator, \textsc{openPangu-Embedded-7B}. Experiments show consistent gains in routing quality and cross-KB evidence coverage, with the full system achieving the best CrossEv@5 while maintaining strong faithfulness and citation correctness.

Paper Structure (28 sections, 13 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 28 sections, 13 equations, 3 figures, 3 tables, 2 algorithms.

Introduction
Related Work
Attribution and evaluation for retrieval-augmented generation.
Evidence ordering and long-context sensitivity.
Graph-augmented retrieval and traditional medicine question answering.
Method
Problem Setup and Notation
Method I: DAKS Routing with Budgeted Retrieval
Probe retrieval and KB-level statistics
Authority-aware KB scoring
Soft budget allocation and candidate pool
Candidate Consolidation
Method II: Alignment Graph-Guided Fusion for Cross-KB Verification
Alignment graph construction
Graph-based bridge retrieval (optional)
...and 13 more sections

Figures (3)

Figure 1: Overall system overview.
Figure 2: DAKS. We run lightweight probe retrieval in each KB, summarize score distributions into KB-level features, compute KB scores, and allocate a soft budget to form a balanced candidate pool.
Figure 3: Alignment Graph-Guided Fusion. We build a chunk--entity alignment graph to compute graph support signals, fuse them with semantic relevance for reranking, and pack evidence under a token budget with cross-KB coverage constraints.

Traceable Cross-Source RAG for Chinese Tibetan Medicine Question Answering

TL;DR

Abstract

Traceable Cross-Source RAG for Chinese Tibetan Medicine Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (3)