Table of Contents
Fetching ...

Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!

Dean Allemang, Juan Sequeda

TL;DR

This work demonstrates that combining ontology-based validation with LLM-driven repair substantially improves QA accuracy for natural-language questions over enterprise data. By deterministically checking SPARQL queries against ontology semantics (OBQC) and iteratively repairing errors with LLMs, the approach achieves an AOEA of 72.55% and reduces the error rate to 19.44%, surpassing prior KG-based implementations. The method provides interpretable error explanations and emphasizes the importance of semantic governance in GenAI-enabled data tools. The findings support deploying ontology-anchored QA systems in real-world enterprise contexts through frameworks like data.world's AI Context Engine.

Abstract

There is increasing evidence that question-answering (QA) systems with Large Language Models (LLMs), which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.

Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!

TL;DR

This work demonstrates that combining ontology-based validation with LLM-driven repair substantially improves QA accuracy for natural-language questions over enterprise data. By deterministically checking SPARQL queries against ontology semantics (OBQC) and iteratively repairing errors with LLMs, the approach achieves an AOEA of 72.55% and reduces the error rate to 19.44%, surpassing prior KG-based implementations. The method provides interpretable error explanations and emphasizes the importance of semantic governance in GenAI-enabled data tools. The findings support deploying ontology-anchored QA systems in real-world enterprise contexts through frameworks like data.world's AI Context Engine.

Abstract

There is increasing evidence that question-answering (QA) systems with Large Language Models (LLMs), which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.
Paper Structure (18 sections, 3 figures, 3 tables)

This paper contains 18 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of our Ontology-based Query Checker and LLM Repair approach
  • Figure 2: Average Overall Execution Accuracy (AOEA) of SPARQL and SQL for all the questions in the benchmark from OurPreviousWork compared to OBQC and LLM Repair
  • Figure 3: Average Overall Execution Accuracy (AOEA) of SPARQL and SQL for all questions in each quadrant from OurPreviousWork compared to OBQC and LLM Repair