Table of Contents
Fetching ...

Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs

Daniel Steinigen, Roman Teucher, Timm Heine Ruland, Max Rudat, Nicolas Flores-Herr, Peter Fischer, Nikola Milosevic, Christopher Schymura, Angelo Ziletti

TL;DR

FactFinder tackles LLM hallucinations by integrating a medical knowledge graph with a large language model to boost factual accuracy. It presents an end-to-end pipeline for text-to-Cypher generation, KG retrieval, and LLM-enhanced verbalization, coupled with explainable evidence and a Streamlit UI. On a 69-sample medical dataset, the system achieves a precision of $78\%$ in retrieving correct KG nodes and surpasses a standalone LLM in correctness and completeness, as judged by LLM-based evaluation. The work demonstrates a practical, fast, and transparent hybrid QA approach for domain-specific questions, with open-source code and data to support replication and extension.

Abstract

Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (KGs), thereby aiming to enhance factual correctness using a KG-based retrieval approach. We focus on a medical KG to demonstrate our methodology, which includes (1) pre-processing, (2) Cypher query generation, (3) Cypher query processing, (4) KG retrieval, and (5) LLM-enhanced response generation. We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78\% in retrieving correct KG nodes. Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness, as verified by an LLM-as-a-Judge evaluation method. This positions the system as a promising tool for applications that demand factual correctness and completeness, such as target identification -- a critical process in pinpointing biological entities for disease treatment or crop enhancement. Moreover, its intuitive search interface and ability to provide accurate responses within seconds make it well-suited for time-sensitive, precision-focused research contexts. We publish the source code together with the dataset and the prompt templates used.

Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs

TL;DR

FactFinder tackles LLM hallucinations by integrating a medical knowledge graph with a large language model to boost factual accuracy. It presents an end-to-end pipeline for text-to-Cypher generation, KG retrieval, and LLM-enhanced verbalization, coupled with explainable evidence and a Streamlit UI. On a 69-sample medical dataset, the system achieves a precision of in retrieving correct KG nodes and surpasses a standalone LLM in correctness and completeness, as judged by LLM-based evaluation. The work demonstrates a practical, fast, and transparent hybrid QA approach for domain-specific questions, with open-source code and data to support replication and extension.

Abstract

Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (KGs), thereby aiming to enhance factual correctness using a KG-based retrieval approach. We focus on a medical KG to demonstrate our methodology, which includes (1) pre-processing, (2) Cypher query generation, (3) Cypher query processing, (4) KG retrieval, and (5) LLM-enhanced response generation. We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78\% in retrieving correct KG nodes. Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness, as verified by an LLM-as-a-Judge evaluation method. This positions the system as a promising tool for applications that demand factual correctness and completeness, such as target identification -- a critical process in pinpointing biological entities for disease treatment or crop enhancement. Moreover, its intuitive search interface and ability to provide accurate responses within seconds make it well-suited for time-sensitive, precision-focused research contexts. We publish the source code together with the dataset and the prompt templates used.
Paper Structure (14 sections, 8 figures, 2 tables)

This paper contains 14 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of the FactFinder pipeline using large language models and knowledge graphs to answer scientific questions.
  • Figure 2: Example of the evidence subgraph for Which drugs against epilepsy should not be used by patients with hypertension?
  • Figure 3: User Interface with question and answers of the standalone LLM and our graph-based hybrid system.
  • Figure 4: Answer for exploring drugs used to treat epilepsy.
  • Figure 5: Answer for exploring genes targeted by amobarbital but not lamotrigine.
  • ...and 3 more figures