Table of Contents
Fetching ...

MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering

Feiyang Li, Yingjian Chen, Haoran Liu, Rui Yang, Han Yuan, Yuang Jiang, Tianxiao Li, Edison Marrese Taylor, Hossein Rouhizadeh, Yusuke Iwasawa, Douglas Teodoro, Yutaka Matsuo, Irene Li

TL;DR

This work tackles multilingual medical QA by bridging language gaps for English-centric LLMs through external medical knowledge graphs. It introduces MKG-Rank, a knowledge-graph augmented retrieval framework with a word-level translation pipeline, caching, and multi-angle ranking to efficiently fetch and filter relevant medical facts. Key components include medical entity extraction and translation, external KG retrieval from UMLS, multi-angle ranking with embedding-based similarity and a cross-encoder filter, declarative conversion of triplets, and optional self-information mining with BM25. Empirical results across Japanese, Chinese, Korean, and Swahili datasets show up to 35.03% accuracy improvement over zero-shot baselines and retrieval time as low as 0.0009 seconds on average, with ablations validating the contribution of each module.

Abstract

Large Language Models (LLMs) have shown remarkable progress in medical question answering (QA), yet their effectiveness remains predominantly limited to English due to imbalanced multilingual training data and scarce medical resources for low-resource languages. To address this critical language gap in medical QA, we propose Multilingual Knowledge Graph-based Retrieval Ranking (MKG-Rank), a knowledge graph-enhanced framework that enables English-centric LLMs to perform multilingual medical QA. Through a word-level translation mechanism, our framework efficiently integrates comprehensive English-centric medical knowledge graphs into LLM reasoning at a low cost, mitigating cross-lingual semantic distortion and achieving precise medical QA across language barriers. To enhance efficiency, we introduce caching and multi-angle ranking strategies to optimize the retrieval process, significantly reducing response times and prioritizing relevant medical knowledge. Extensive evaluations on multilingual medical QA benchmarks across Chinese, Japanese, Korean, and Swahili demonstrate that MKG-Rank consistently outperforms zero-shot LLMs, achieving maximum 35.03% increase in accuracy, while maintaining an average retrieval time of only 0.0009 seconds.

MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering

TL;DR

This work tackles multilingual medical QA by bridging language gaps for English-centric LLMs through external medical knowledge graphs. It introduces MKG-Rank, a knowledge-graph augmented retrieval framework with a word-level translation pipeline, caching, and multi-angle ranking to efficiently fetch and filter relevant medical facts. Key components include medical entity extraction and translation, external KG retrieval from UMLS, multi-angle ranking with embedding-based similarity and a cross-encoder filter, declarative conversion of triplets, and optional self-information mining with BM25. Empirical results across Japanese, Chinese, Korean, and Swahili datasets show up to 35.03% accuracy improvement over zero-shot baselines and retrieval time as low as 0.0009 seconds on average, with ablations validating the contribution of each module.

Abstract

Large Language Models (LLMs) have shown remarkable progress in medical question answering (QA), yet their effectiveness remains predominantly limited to English due to imbalanced multilingual training data and scarce medical resources for low-resource languages. To address this critical language gap in medical QA, we propose Multilingual Knowledge Graph-based Retrieval Ranking (MKG-Rank), a knowledge graph-enhanced framework that enables English-centric LLMs to perform multilingual medical QA. Through a word-level translation mechanism, our framework efficiently integrates comprehensive English-centric medical knowledge graphs into LLM reasoning at a low cost, mitigating cross-lingual semantic distortion and achieving precise medical QA across language barriers. To enhance efficiency, we introduce caching and multi-angle ranking strategies to optimize the retrieval process, significantly reducing response times and prioritizing relevant medical knowledge. Extensive evaluations on multilingual medical QA benchmarks across Chinese, Japanese, Korean, and Swahili demonstrate that MKG-Rank consistently outperforms zero-shot LLMs, achieving maximum 35.03% increase in accuracy, while maintaining an average retrieval time of only 0.0009 seconds.

Paper Structure

This paper contains 25 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The overall architecture of our proposed MKG-Rank. The English translation of the question and options in the figure is provided in the Appendix \ref{['appendix_pipline']}.
  • Figure 2: Comparison of the Acc evaluated on Qwen-2.5 72B and GPT-4o-mini across four language datasets with (w/) and without (w/o) declarative conversion.
  • Figure 3: Case Study. More details, along with the English version of the questions and options are provided in the Appendix \ref{['appendix_case_study']}.
  • Figure 4: Additional ablation experiments on Llama 70B, Claude-3.5 haiku, and GPT-4o across four language datasets with (w) and without (w/o) multi-angle ranking.
  • Figure 5: Prompts for extracting medical entities from question.
  • ...and 6 more figures