CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning
Zukang Yang, Zixuan Zhu, Xuan Zhu
TL;DR
CuriousLLM tackles hallucination and latency in multi-document QA by introducing a curiosity-driven LLM agent that generates follow-up questions to steer knowledge-graph traversal. Central to the approach is the Follow-upQA dataset, which provides ground-truth follow-up questions (or NA signals) to train the agent, enabling more efficient retrieval with fewer passages. The method builds a Knowledge Graph with multi-hop passages encoded via a MDR-trained BERT encoder and uses a Mistral-7B-based traversal agent fine-tuned with QLoRA to perform BFS over the KG, guided by follow-up questions and terminating early when sufficient evidence is gathered. Empirical results on MD-QA benchmarks show consistent accuracy improvements over strong baselines, shorter runtimes due to early termination, and robust follow-up-question generation demonstrated on the Follow-upQA benchmark. This work offers a scalable, resource-efficient path to high-quality MD-QA by integrating curiosity-driven reasoning into KG-based retrieval and prompting.
Abstract
Large Language Models (LLMs) have achieved significant success in open-domain question answering. However, they continue to face challenges such as hallucinations and knowledge cutoffs. These issues can be mitigated through in-context learning by providing LLMs with relevant context before generating answers. Recent literature proposes Knowledge Graph Prompting (KGP) which integrates knowledge graphs with an LLM-based traversal agent to substantially enhance document retrieval quality. However, KGP requires costly fine-tuning with large datasets and remains prone to hallucination. In this paper, we propose CuriousLLM, an enhancement that integrates a curiosity-driven reasoning mechanism into an LLM agent. This mechanism enables the agent to generate relevant follow-up questions, thereby guiding the information retrieval process more efficiently. Central to our approach is the development of the new Follow-upQA dataset, which includes questions and supporting evidence as input, with follow-up questions serving as ground truths. These follow-up questions either inquire about what is still missing to fully answer the user's query or use special tokens to signify that the retrieved evidence is sufficient. Our experiments show that CuriousLLM significantly boosts LLM performance in multi-document question answering (MD-QA), circumventing the substantial computational costs and latency from the original KGP framework.
