Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering
Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen
TL;DR
The paper tackles the challenge of cross-document reasoning in MDQA by introducing IIER, a framework that unifies external documents through a Chunk-Interaction Graph (CIG) capturing structural, semantic, and keyword inter-chunk interactions. It then leverages a graph-based Evidence Chain Retriever to iteratively assemble coherent evidence chains from seed chunks, guiding the LLM to reason with retrieved content. Empirical results on four MDQA datasets show IIER achieving state-of-the-art accuracy, with ablations confirming the value of chain construction, chain scope, and graph density in enhancing retrieval and reasoning. The work advances cross-document QA by coupling rich chunk-level connections with chain-based reasoning, offering potential extensions to multimodal sources and more scalable retrieval pipelines.
Abstract
Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we propose a new retrieval framework IIER, that leverages Inter-chunk Interactions to Enhance Retrieval. This framework captures the internal connections between document chunks by considering three types of interactions: structural, keyword, and semantic. We then construct a unified Chunk-Interaction Graph to represent all external documents comprehensively. Additionally, we design a graph-based evidence chain retriever that utilizes previous paths and chunk interactions to guide the retrieval process. It identifies multiple seed nodes based on the target question and iteratively searches for relevant chunks to gather supporting evidence. This retrieval process refines the context and reasoning chain, aiding the large language model in reasoning and answer generation. Extensive experiments demonstrate that IIER outperforms strong baselines across four datasets, highlighting its effectiveness in improving retrieval and reasoning capabilities.
