MEDMKG: Benchmarking Medical Knowledge Exploitation with Multimodal Knowledge Graph
Xiaochen Wang, Yuan Zhong, Lingwei Zhang, Lisong Dai, Ting Wang, Fenglong Ma
TL;DR
MEDMKG presents a Medical Multimodal Knowledge Graph that unifies imaging and clinical text by extending the UMLS with MIMIC-CXR radiographs. It introduces a two-stage concept extraction pipeline combining rule-based tooling (MetaMap) with a large language model (GPT-4o) and a Neighbor-aware Filtering strategy to curate informative images. The framework is validated through extensive benchmarking across link prediction and knowledge-augmented tasks (text–image retrieval and VQA) using 24 baselines and 4 backbones over 6 datasets, demonstrating improved performance and establishing a foundation for adaptive multimodal knowledge integration in medical AI. The work highlights the importance of model-graph alignment, external knowledge integration, and scalable filtering for robust, real-world clinical applications.
Abstract
Medical deep learning models depend heavily on domain-specific knowledge to perform well on knowledge-intensive clinical tasks. Prior work has primarily leveraged unimodal knowledge graphs, such as the Unified Medical Language System (UMLS), to enhance model performance. However, integrating multimodal medical knowledge graphs remains largely underexplored, mainly due to the lack of resources linking imaging data with clinical concepts. To address this gap, we propose MEDMKG, a Medical Multimodal Knowledge Graph that unifies visual and textual medical information through a multi-stage construction pipeline. MEDMKG fuses the rich multimodal data from MIMIC-CXR with the structured clinical knowledge from UMLS, utilizing both rule-based tools and large language models for accurate concept extraction and relationship modeling. To ensure graph quality and compactness, we introduce Neighbor-aware Filtering (NaF), a novel filtering algorithm tailored for multimodal knowledge graphs. We evaluate MEDMKG across three tasks under two experimental settings, benchmarking twenty-four baseline methods and four state-of-the-art vision-language backbones on six datasets. Results show that MEDMKG not only improves performance in downstream medical tasks but also offers a strong foundation for developing adaptive and robust strategies for multimodal knowledge integration in medical artificial intelligence.
