Table of Contents
Fetching ...

KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation

Jinyuan Fang, Zaiqiao Meng, Craig Macdonald

TL;DR

KiRAG tackles the limitations of retrieval in retrieval-augmented generation for multi-hop QA by grounding iterative retrieval in knowledge triples and integrating reasoning into the retrieval loop. It decomposes documents into <head; relation; tail> triples, builds a stepwise reasoning chain, and uses a Reasoning Chain Aligner to select triples that extend the chain, while a Reasoning Chain Constructor grounds the extension in factual knowledge. A document ranking step ties retrieved triples to source documents, which are then used by a reader to generate answers. Empirically, KiRAG delivers significant improvements over state-of-the-art iRAG models across multiple multi-hop QA datasets, demonstrating strong retrieval quality, robust reasoning-guided retrieval, and practical efficiency through offline triple extraction. The approach shows promise for reliable, adaptable retrieval in complex QA tasks and offers a blueprint for knowledge-grounded iterative retrieval in RAG frameworks.

Abstract

Iterative retrieval-augmented generation (iRAG) models offer an effective approach for multi-hop question answering (QA). However, their retrieval process faces two key challenges: (1) it can be disrupted by irrelevant documents or factually inaccurate chain-of-thoughts; (2) their retrievers are not designed to dynamically adapt to the evolving information needs in multi-step reasoning, making it difficult to identify and retrieve the missing information required at each iterative step. Therefore, we propose KiRAG, which uses a knowledge-driven iterative retriever model to enhance the retrieval process of iRAG. Specifically, KiRAG decomposes documents into knowledge triples and performs iterative retrieval with these triples to enable a factually reliable retrieval process. Moreover, KiRAG integrates reasoning into the retrieval process to dynamically identify and retrieve knowledge that bridges information gaps, effectively adapting to the evolving information needs. Empirical results show that KiRAG significantly outperforms existing iRAG models, with an average improvement of 9.40% in R@3 and 5.14% in F1 on multi-hop QA.

KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation

TL;DR

KiRAG tackles the limitations of retrieval in retrieval-augmented generation for multi-hop QA by grounding iterative retrieval in knowledge triples and integrating reasoning into the retrieval loop. It decomposes documents into <head; relation; tail> triples, builds a stepwise reasoning chain, and uses a Reasoning Chain Aligner to select triples that extend the chain, while a Reasoning Chain Constructor grounds the extension in factual knowledge. A document ranking step ties retrieved triples to source documents, which are then used by a reader to generate answers. Empirically, KiRAG delivers significant improvements over state-of-the-art iRAG models across multiple multi-hop QA datasets, demonstrating strong retrieval quality, robust reasoning-guided retrieval, and practical efficiency through offline triple extraction. The approach shows promise for reliable, adaptable retrieval in complex QA tasks and offers a blueprint for knowledge-grounded iterative retrieval in RAG frameworks.

Abstract

Iterative retrieval-augmented generation (iRAG) models offer an effective approach for multi-hop question answering (QA). However, their retrieval process faces two key challenges: (1) it can be disrupted by irrelevant documents or factually inaccurate chain-of-thoughts; (2) their retrievers are not designed to dynamically adapt to the evolving information needs in multi-step reasoning, making it difficult to identify and retrieve the missing information required at each iterative step. Therefore, we propose KiRAG, which uses a knowledge-driven iterative retriever model to enhance the retrieval process of iRAG. Specifically, KiRAG decomposes documents into knowledge triples and performs iterative retrieval with these triples to enable a factually reliable retrieval process. Moreover, KiRAG integrates reasoning into the retrieval process to dynamically identify and retrieve knowledge that bridges information gaps, effectively adapting to the evolving information needs. Empirical results show that KiRAG significantly outperforms existing iRAG models, with an average improvement of 9.40% in R@3 and 5.14% in F1 on multi-hop QA.

Paper Structure

This paper contains 31 sections, 3 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: (top) Example of top-ranked documents at each step, with relevant content marked in blue and distracting content in orange. We compare KiRAG with IRCoT trivedi2023interleaving and its variant IRDoc, where we replace generated thoughts with top-ranked documents.(Bottom) The corresponding retrieval and QA performance on HotPotQA and 2Wiki datasets.
  • Figure 2: (left) Overview of KiRAG. Given a question, it employs a knowledge-driven iterative retrieval process (Step 1) to retrieve relevant knowledge triples, including three iterative steps: knowledge decomposition, candidate knowledge identification and reasoning chain construction. The retrieved triples are used to rank documents (Step 2), which are passed to the reader for answer generation (Step 3). (right) Training strategy for the Reasoning Chain Aligner, designed to optimise the identification of relevant knowledge triples at each step of the retrieval process.
  • Figure 3: Retrieval performance (%) for relevant documents required across different steps, where most questions in 2Wiki have only two relevant documents.
  • Figure 4: The effect of the number of iterative steps $L$.
  • Figure 5: Case study of KiRAG and IRCoT on HotPotQA test set, where the relevant and irrelevant context are marked in blue and orange, respectively.
  • ...and 7 more figures