Table of Contents
Fetching ...

Multi-hop Question Answering over Knowledge Graphs using Large Language Models

Abir Chakraborty

TL;DR

This work tackles multi-hop KGQA by evaluating two LLM-based pipelines: IR-LLM, which reasons over retrieved subgraphs using iterative 1-hop relation filtering and candidate-node narrowing, and SP-LLM, which constructs SPARQL via graph-schema prompts using node/edge descriptions. Both approaches can employ Retrieval Augmented Generation to cope with limited context windows, and are tested across six diverse KGs to reveal dataset-dependent effectiveness. Empirical results show IR-LLM achieving strong gains on WebQSP and MetaQA, with competitive performance on ComplexWebQuestions, while SP-LLM attains state-of-the-art Hits@1 on LC-QuAD variants and KQAPro, albeit with EM/F1 trades compared to fine-tuned baselines. Overall, the study demonstrates that carefully designed prompt engineering, retrieval strategies, and iterative reasoning enable effective LLM-based KGQA across different data regimes, guiding practical deployment and future improvements in scale and accuracy.

Abstract

Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Traditional approaches of question answering on KG are based on (a) semantic parsing (SP), where a logical form (e.g., S-expression, SPARQL query, etc.) is generated using node and edge embeddings and then reasoning over these representations or tuning language models to generate the final answer directly, or (b) information-retrieval based that works by extracting entities and relations sequentially. In this work, we evaluate the capability of (LLMs) to answer questions over KG that involve multiple hops. We show that depending upon the size and nature of the KG we need different approaches to extract and feed the relevant information to an LLM since every LLM comes with a fixed context window. We evaluate our approach on six KGs with and without the availability of example-specific sub-graphs and show that both the IR and SP-based methods can be adopted by LLMs resulting in an extremely competitive performance.

Multi-hop Question Answering over Knowledge Graphs using Large Language Models

TL;DR

This work tackles multi-hop KGQA by evaluating two LLM-based pipelines: IR-LLM, which reasons over retrieved subgraphs using iterative 1-hop relation filtering and candidate-node narrowing, and SP-LLM, which constructs SPARQL via graph-schema prompts using node/edge descriptions. Both approaches can employ Retrieval Augmented Generation to cope with limited context windows, and are tested across six diverse KGs to reveal dataset-dependent effectiveness. Empirical results show IR-LLM achieving strong gains on WebQSP and MetaQA, with competitive performance on ComplexWebQuestions, while SP-LLM attains state-of-the-art Hits@1 on LC-QuAD variants and KQAPro, albeit with EM/F1 trades compared to fine-tuned baselines. Overall, the study demonstrates that carefully designed prompt engineering, retrieval strategies, and iterative reasoning enable effective LLM-based KGQA across different data regimes, guiding practical deployment and future improvements in scale and accuracy.

Abstract

Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Traditional approaches of question answering on KG are based on (a) semantic parsing (SP), where a logical form (e.g., S-expression, SPARQL query, etc.) is generated using node and edge embeddings and then reasoning over these representations or tuning language models to generate the final answer directly, or (b) information-retrieval based that works by extracting entities and relations sequentially. In this work, we evaluate the capability of (LLMs) to answer questions over KG that involve multiple hops. We show that depending upon the size and nature of the KG we need different approaches to extract and feed the relevant information to an LLM since every LLM comes with a fixed context window. We evaluate our approach on six KGs with and without the availability of example-specific sub-graphs and show that both the IR and SP-based methods can be adopted by LLMs resulting in an extremely competitive performance.
Paper Structure (12 sections, 2 figures, 7 tables)

This paper contains 12 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Different components for question-answering from a KB. Left subfigure: the main flow that invokes 'get 1-hop candidates' component.
  • Figure 2: The flow for SP-LLM that generates a SPARQL query from the given question. There are three skills, (1) entity identification, (2) predicate identification and finally, (3) SPARQL generation.