Table of Contents
Fetching ...

RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF for Conversational QA over KGs with RAG

Rishiraj Saha Roy, Chris Hinze, Joel Schlotthauer, Farzad Naderi, Viktor Hangya, Andreas Foltyn, Luzian Hahn, Fabian Kuech

TL;DR

The paper addresses ConvQA over RDF knowledge graphs where SPARQL is brittle for complex and abstract intents. It introduces RAGONITE, a two-branch retrieval augmented generation system that uses SQL over an induced KG-derived database and NL retrieval over verbalized KG facts, with iterative rounds and a second LLM to fuse results. The system supports open and on-premise LLMs and enables heterogeneous QA by incorporating external text sources. Evaluations on a BMW KG show the two-branch iterative approach achieves 28/30 correct, outperforming SPARQL-only and single-branch baselines and highlighting the value of Text2SQL over KG-derived databases plus verbalized KG passages. This work demonstrates a practical, flexible ConvQA pipeline with potential for further enhancements via reflection and synthetic-data fine-tuning.

Abstract

Conversational question answering (ConvQA) is a convenient means of searching over RDF knowledge graphs (KGs), where a prevalent approach is to translate natural language questions to SPARQL queries. However, SPARQL has certain shortcomings: (i) it is brittle for complex intents and conversational questions, and (ii) it is not suitable for more abstract needs. Instead, we propose a novel two-pronged system where we fuse: (i) SQL-query results over a database automatically derived from the KG, and (ii) text-search results over verbalizations of KG facts. Our pipeline supports iterative retrieval: when the results of any branch are found to be unsatisfactory, the system can automatically opt for further rounds. We put everything together in a retrieval augmented generation (RAG) setup, where an LLM generates a coherent response from accumulated search results. We demonstrate the superiority of our proposed system over several baselines on a knowledge graph of BMW automobiles.

RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF for Conversational QA over KGs with RAG

TL;DR

The paper addresses ConvQA over RDF knowledge graphs where SPARQL is brittle for complex and abstract intents. It introduces RAGONITE, a two-branch retrieval augmented generation system that uses SQL over an induced KG-derived database and NL retrieval over verbalized KG facts, with iterative rounds and a second LLM to fuse results. The system supports open and on-premise LLMs and enables heterogeneous QA by incorporating external text sources. Evaluations on a BMW KG show the two-branch iterative approach achieves 28/30 correct, outperforming SPARQL-only and single-branch baselines and highlighting the value of Text2SQL over KG-derived databases plus verbalized KG passages. This work demonstrates a practical, flexible ConvQA pipeline with potential for further enhancements via reflection and synthetic-data fine-tuning.

Abstract

Conversational question answering (ConvQA) is a convenient means of searching over RDF knowledge graphs (KGs), where a prevalent approach is to translate natural language questions to SPARQL queries. However, SPARQL has certain shortcomings: (i) it is brittle for complex intents and conversational questions, and (ii) it is not suitable for more abstract needs. Instead, we propose a novel two-pronged system where we fuse: (i) SQL-query results over a database automatically derived from the KG, and (ii) text-search results over verbalizations of KG facts. Our pipeline supports iterative retrieval: when the results of any branch are found to be unsatisfactory, the system can automatically opt for further rounds. We put everything together in a retrieval augmented generation (RAG) setup, where an LLM generates a coherent response from accumulated search results. We demonstrate the superiority of our proposed system over several baselines on a knowledge graph of BMW automobiles.

Paper Structure

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: The RAGONITE workflow for ConvQA over KGs with retrieval augmented generation.
  • Figure 2: Canonical example illustrating automatic database induction from knowledge graph.
  • Figure 3: Illustrating iterative retrieval with SQL and text search tools via LLM calls in RAGONITE.
  • Figure 4: Screenshot of RAGONITE. Colored boxes are not part of the UI. Text readable on zoom-in.