Table of Contents
Fetching ...

HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Pranoy Panda, Ankush Agarwal, Chaitanya Devaguptapu, Manohar Kaul, Prathosh A P

TL;DR

The paper tackles MHQA on unstructured text by enabling LLMs with a query-aware hyper-relational KG distilled from supporting documents, reducing noise and token count. HOLMES follows a training-free, zero-shot pipeline with three components: query-dependent knowledge discovery, a query-aligned knowledge schema for refinement, and reader prompt construction that verbalizes the distilled facts. It achieves state-of-the-art results on HotpotQA and MuSiQue across multiple LLMs while reducing input tokens and maintaining high semantic and human-evaluated quality. The approach enhances context grounding for MHQA and offers a scalable, education-friendly solution with potential for broader domains and further efficiency gains, despite increased upfront computation for auxiliary schema creation and potential incompleteness in extracted graphs.

Abstract

Given unstructured text, Large Language Models (LLMs) are adept at answering simple (single-hop) questions. However, as the complexity of the questions increase, the performance of LLMs degrade. We believe this is due to the overhead associated with understanding the complex question followed by filtering and aggregating unstructured information in the raw text. Recent methods try to reduce this burden by integrating structured knowledge triples into the raw text, aiming to provide a structured overview that simplifies information processing. However, this simplistic approach is query-agnostic and the extracted facts are ambiguous as they lack context. To address these drawbacks and to enable LLMs to answer complex (multi-hop) questions with ease, we propose to use a knowledge graph (KG) that is context-aware and is distilled to contain query-relevant information. The use of our compressed distilled KG as input to the LLM results in our method utilizing up to $67\%$ fewer tokens to represent the query relevant information present in the supporting documents, compared to the state-of-the-art (SoTA) method. Our experiments show consistent improvements over the SoTA across several metrics (EM, F1, BERTScore, and Human Eval) on two popular benchmark datasets (HotpotQA and MuSiQue).

HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

TL;DR

The paper tackles MHQA on unstructured text by enabling LLMs with a query-aware hyper-relational KG distilled from supporting documents, reducing noise and token count. HOLMES follows a training-free, zero-shot pipeline with three components: query-dependent knowledge discovery, a query-aligned knowledge schema for refinement, and reader prompt construction that verbalizes the distilled facts. It achieves state-of-the-art results on HotpotQA and MuSiQue across multiple LLMs while reducing input tokens and maintaining high semantic and human-evaluated quality. The approach enhances context grounding for MHQA and offers a scalable, education-friendly solution with potential for broader domains and further efficiency gains, despite increased upfront computation for auxiliary schema creation and potential incompleteness in extracted graphs.

Abstract

Given unstructured text, Large Language Models (LLMs) are adept at answering simple (single-hop) questions. However, as the complexity of the questions increase, the performance of LLMs degrade. We believe this is due to the overhead associated with understanding the complex question followed by filtering and aggregating unstructured information in the raw text. Recent methods try to reduce this burden by integrating structured knowledge triples into the raw text, aiming to provide a structured overview that simplifies information processing. However, this simplistic approach is query-agnostic and the extracted facts are ambiguous as they lack context. To address these drawbacks and to enable LLMs to answer complex (multi-hop) questions with ease, we propose to use a knowledge graph (KG) that is context-aware and is distilled to contain query-relevant information. The use of our compressed distilled KG as input to the LLM results in our method utilizing up to fewer tokens to represent the query relevant information present in the supporting documents, compared to the state-of-the-art (SoTA) method. Our experiments show consistent improvements over the SoTA across several metrics (EM, F1, BERTScore, and Human Eval) on two popular benchmark datasets (HotpotQA and MuSiQue).
Paper Structure (32 sections, 2 equations, 8 figures, 10 tables)

This paper contains 32 sections, 2 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Multi-Dimensional Improvements: Our method (with GPT-4 as reader LLM) achieves SoTA results on several datasets and multiple Multi-hop QA metrics. EM: Exact-Match with the gold answer, Self-Aware EM: Confidence-aware EM, BertScorezhang2019bertscore: Semantic similarity between predicted and gold answer; Query Info Efficiency: Efficiency of representing query-relevant information in the supporting documents - inversely proportional to the input token count for the reader LLM.
  • Figure 2: Multi-Hop QA Case Study: This figure illustrates a bridge-type multi-hop question from HotpotQA dataset for comparing our method with the baselines. It highlights our method's accurate identification of 'Snow Patrol' as the crucial bridge entity and subsequently finding the lead vocalist, a feat not achieved by baselines.
  • Figure 3: Method Overview: Our method has three key steps - (i) Query-Dependent Structured Knowledge Discovery (Section \ref{['subsec:HyperKGConstruction']}), (ii) Knowledge Schema Construction for Information Refinement (Section \ref{['subsec: knowledge schema construction']}), and (iii) Reader LLM Prompt Construction (Section \ref{['sec:input_prompt']}). Step (i) involves creation of an entity document graph ($\textcircled{\small{\textbf{1.a}}}$ in the Figure), and performing a level-order traversal on it to get a Hyper-relational KG ($\textcircled{\small{\textbf{2}}}$ in the Figure). Next, in step (ii), we create a query-aligned knowledge schema from the question and an auxiliary graph schema ($\textcircled{\small{\textbf{1.b}}}$ in the Figure), and use it to prune the Hyper-Relational KG ($\textcircled{\small{\textbf{3}}}$ in the Figure) - which forms the input for the LLM.
  • Figure 4: Hop-wise Performance: Comparison on MuSiQue dataset with varying question complexity. Bars denote standard error - $\sqrt{\frac{em (1 - em)}{n}}$.
  • Figure 5: Impact of pruning on MHQA performance in HotpotQA dataset
  • ...and 3 more figures