MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
Laura Cabello, Carmen Martin-Turrero, Uchenna Akujuobi, Anders Søgaard, Carlos Bobed
TL;DR
MEG introduces a parameter-efficient framework for medical knowledge augmentation of LLMs by injecting pretrained KG embeddings through a lightweight mapping network. A GraphSAGE-based KG encoder produces KGEs that are transformed into the LLM space and injected after the embedding layer, guided by a grounding module that links textual mentions to KG nodes. Training occurs in two phases: embedding transfer learning to align KGEs with the LLM and subsequent LoRA-based fine-tuning on downstream medical QA tasks, achieving substantial accuracy gains over specialized baselines on four MCQA datasets (e.g., +6.7% and +9.9% over BioMistral-7B and MediTron-7B). The approach demonstrates robust performance across base LLMs (Mistral and LLaMA-3) and shows that KGEs enrich factual grounding without full base-model retraining, suggesting a practical path for domain-specific QA in medicine.
Abstract
Question answering is a natural language understanding task that involves reasoning over both explicit context, and unstated relevant domain knowledge. Despite the high cost of training, large language models (LLMs) -- the backbone of most modern question-answering systems -- still struggle to reliably capture the nuanced relationships between concepts that are crucial for reasoning in specialized fields like medicine. In this work, we present MEG, a parameter-efficient approach for medical knowledge-augmented LLMs. MEG uses a lightweight mapping network to incorporate knowledge graph embeddings into the LLM, enabling it to leverage external knowledge in a cost-effective way. We evaluate our method on four popular medical multiple-choice datasets and show that LLMs i) can effectively interpret knowledge graph embeddings and ii) gain significant advantages from the factual grounding these embeddings provide. MEG attains an average of +6.7% and +9.9% accuracy over specialized models like BioMistral-7B and MediTron-7B, respectively. Finally, we show that MEG's performance remains robust to the choice of graph encoder.
