CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning

Zukang Yang; Zixuan Zhu; Xuan Zhu

CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning

Zukang Yang, Zixuan Zhu, Xuan Zhu

TL;DR

CuriousLLM tackles hallucination and latency in multi-document QA by introducing a curiosity-driven LLM agent that generates follow-up questions to steer knowledge-graph traversal. Central to the approach is the Follow-upQA dataset, which provides ground-truth follow-up questions (or NA signals) to train the agent, enabling more efficient retrieval with fewer passages. The method builds a Knowledge Graph with multi-hop passages encoded via a MDR-trained BERT encoder and uses a Mistral-7B-based traversal agent fine-tuned with QLoRA to perform BFS over the KG, guided by follow-up questions and terminating early when sufficient evidence is gathered. Empirical results on MD-QA benchmarks show consistent accuracy improvements over strong baselines, shorter runtimes due to early termination, and robust follow-up-question generation demonstrated on the Follow-upQA benchmark. This work offers a scalable, resource-efficient path to high-quality MD-QA by integrating curiosity-driven reasoning into KG-based retrieval and prompting.

Abstract

Large Language Models (LLMs) have achieved significant success in open-domain question answering. However, they continue to face challenges such as hallucinations and knowledge cutoffs. These issues can be mitigated through in-context learning by providing LLMs with relevant context before generating answers. Recent literature proposes Knowledge Graph Prompting (KGP) which integrates knowledge graphs with an LLM-based traversal agent to substantially enhance document retrieval quality. However, KGP requires costly fine-tuning with large datasets and remains prone to hallucination. In this paper, we propose CuriousLLM, an enhancement that integrates a curiosity-driven reasoning mechanism into an LLM agent. This mechanism enables the agent to generate relevant follow-up questions, thereby guiding the information retrieval process more efficiently. Central to our approach is the development of the new Follow-upQA dataset, which includes questions and supporting evidence as input, with follow-up questions serving as ground truths. These follow-up questions either inquire about what is still missing to fully answer the user's query or use special tokens to signify that the retrieved evidence is sufficient. Our experiments show that CuriousLLM significantly boosts LLM performance in multi-document question answering (MD-QA), circumventing the substantial computational costs and latency from the original KGP framework.

CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning

TL;DR

Abstract

Paper Structure (23 sections, 2 equations, 5 figures, 3 tables)

This paper contains 23 sections, 2 equations, 5 figures, 3 tables.

Introduction
Methodology
Follow-upQA Dataset
Knowledge Graph Construction
Curious LLM Traversal Agent
LLM Response Generation
Experiments
Evaluation Metrics
Performance and Analysis
Follow-upQA Benchmark
Evaluation on Follow-upQA
Conclusion
Related Work
Retrieval-based Models.
Generative Models.
...and 8 more sections

Figures (5)

Figure 1: Two common types of questions in HotpotQA yang2018hotpotqa: (1) Comparison questions require parallel reasoning over different documents. (2) Bridging questions require sequential reasoning.
Figure 2: Overview of the CuriousLLM workflow and an Follow-upQA example. Given a query, the system obtains seeding passages, and then starts searching for relevant documents; with follow-up question Q1 generated by the LLM agent, the unrelated passages S1 and S2 form a search path leading to the final answer.
Figure 3: Benchmark Mistral-7B for Follow-upQA. First row: distribution plots for ROUGE-1, ROUGE-L, and cosine similarity across hyper-parameters. Second row: Mistral-7B performance at different training checkpoints.
Figure 4: A comparison of MD-QA across LLM agents. Mistral_ET is Mistral agent with early traversal termination. Accuracy calculates the correct rate of the questions that are early terminated by Mistral_ET. Iterations can be interpreted as the number of nodes visited. Runtime records the average runtime in second per question.
Figure 5: MD-QA performance on HotpotQA with Mistral agent at different training checkpoints.

CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning

TL;DR

Abstract

CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)