Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
Daniel Steinigen, Roman Teucher, Timm Heine Ruland, Max Rudat, Nicolas Flores-Herr, Peter Fischer, Nikola Milosevic, Christopher Schymura, Angelo Ziletti
TL;DR
FactFinder tackles LLM hallucinations by integrating a medical knowledge graph with a large language model to boost factual accuracy. It presents an end-to-end pipeline for text-to-Cypher generation, KG retrieval, and LLM-enhanced verbalization, coupled with explainable evidence and a Streamlit UI. On a 69-sample medical dataset, the system achieves a precision of $78\%$ in retrieving correct KG nodes and surpasses a standalone LLM in correctness and completeness, as judged by LLM-based evaluation. The work demonstrates a practical, fast, and transparent hybrid QA approach for domain-specific questions, with open-source code and data to support replication and extension.
Abstract
Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (KGs), thereby aiming to enhance factual correctness using a KG-based retrieval approach. We focus on a medical KG to demonstrate our methodology, which includes (1) pre-processing, (2) Cypher query generation, (3) Cypher query processing, (4) KG retrieval, and (5) LLM-enhanced response generation. We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78\% in retrieving correct KG nodes. Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness, as verified by an LLM-as-a-Judge evaluation method. This positions the system as a promising tool for applications that demand factual correctness and completeness, such as target identification -- a critical process in pinpointing biological entities for disease treatment or crop enhancement. Moreover, its intuitive search interface and ability to provide accurate responses within seconds make it well-suited for time-sensitive, precision-focused research contexts. We publish the source code together with the dataset and the prompt templates used.
