Table of Contents
Fetching ...

Multi-granular Training Strategies for Robust Multi-hop Reasoning Over Noisy and Heterogeneous Knowledge Sources

Jackson Coleman, Isaiah Lawrence, Benjamin Turner

TL;DR

Multi-source multi-hop QA faces challenges from heterogeneous knowledge, cascading reasoning errors, and scalability limits. The paper introduces AMKOR, a generative framework that fuses parametric LLM knowledge with retrieved evidence and explores reasoning trajectories through probabilistic beam reasoning, all within an end-to-end trainable setup. A multi-granular learning strategy optimizes both local reasoning steps and global answer quality via a loss $\\mathcal{L} = \lambda_{local} \\mathcal{L}_{local} + \lambda_{global} \\mathcal{L}_{global}$ with $\\mathcal{L}_{local} = -\\frac{1}{m} \\sum_{i=1}^m \log P(t_i | t_{<i}, q, \\mathcal{K})$ and $\\mathcal{L}_{global} = -\\log P(a | \\mathcal{T}, q, \\mathcal{K})$. Experiments on HotpotQA, 2WikiMQA, MuSiQue, and Bamboogle demonstrate state-of-the-art performance and robustness to noisy knowledge, with analyses confirming the critical roles of probabilistic beam reasoning and multi-source fusion, especially on more complex tasks.

Abstract

Multi-source multi-hop question answering (QA) represents a challenging task in natural language processing due to the need for dynamic integration of heterogeneous knowledge sources and multi-step reasoning. Existing methods often suffer from cascading errors, insufficient handling of knowledge conflicts, and computational inefficiency. In this paper, we propose Adaptive Multi-source Knowledge-Oriented Reasoning (AMKOR), a generative framework that leverages large language models (LLMs) to dynamically fuse parametric and retrieved knowledge while exploring reasoning trajectories using probabilistic beam reasoning. AMKOR is further enhanced by a multi-granular learning strategy, optimizing both local reasoning steps and global answer accuracy. Experiments conducted on four widely-used multi-hop QA datasets, including HotpotQA and MuSiQue, demonstrate that AMKOR achieves state-of-the-art performance, significantly outperforming baseline methods on both reasoning accuracy and robustness. Additional analyses confirm its scalability, adaptability to noisy knowledge, and superior ability to handle complex multi-hop tasks. This work establishes a new benchmark for multi-source multi-hop QA by effectively combining reasoning quality and efficiency.

Multi-granular Training Strategies for Robust Multi-hop Reasoning Over Noisy and Heterogeneous Knowledge Sources

TL;DR

Multi-source multi-hop QA faces challenges from heterogeneous knowledge, cascading reasoning errors, and scalability limits. The paper introduces AMKOR, a generative framework that fuses parametric LLM knowledge with retrieved evidence and explores reasoning trajectories through probabilistic beam reasoning, all within an end-to-end trainable setup. A multi-granular learning strategy optimizes both local reasoning steps and global answer quality via a loss with and . Experiments on HotpotQA, 2WikiMQA, MuSiQue, and Bamboogle demonstrate state-of-the-art performance and robustness to noisy knowledge, with analyses confirming the critical roles of probabilistic beam reasoning and multi-source fusion, especially on more complex tasks.

Abstract

Multi-source multi-hop question answering (QA) represents a challenging task in natural language processing due to the need for dynamic integration of heterogeneous knowledge sources and multi-step reasoning. Existing methods often suffer from cascading errors, insufficient handling of knowledge conflicts, and computational inefficiency. In this paper, we propose Adaptive Multi-source Knowledge-Oriented Reasoning (AMKOR), a generative framework that leverages large language models (LLMs) to dynamically fuse parametric and retrieved knowledge while exploring reasoning trajectories using probabilistic beam reasoning. AMKOR is further enhanced by a multi-granular learning strategy, optimizing both local reasoning steps and global answer accuracy. Experiments conducted on four widely-used multi-hop QA datasets, including HotpotQA and MuSiQue, demonstrate that AMKOR achieves state-of-the-art performance, significantly outperforming baseline methods on both reasoning accuracy and robustness. Additional analyses confirm its scalability, adaptability to noisy knowledge, and superior ability to handle complex multi-hop tasks. This work establishes a new benchmark for multi-source multi-hop QA by effectively combining reasoning quality and efficiency.

Paper Structure

This paper contains 22 sections, 8 equations, 7 tables.