Table of Contents
Fetching ...

BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

Zheng Chu, Jingchang Chen, Qianglong Chen, Haotian Wang, Kun Zhu, Xiyuan Du, Weijiang Yu, Ming Liu, Bing Qin

TL;DR

BeamAggR addresses factual errors in knowledge-intensive multi-hop QA by integrating a divide-and-conquer strategy with multi-source knowledge. It decomposes complex questions into trees, performs bottom-up reasoning using complementary internal and external knowledge, and uses probabilistic beam aggregation to explore and select promising reasoning trajectories. Empirical results across four open-domain datasets show substantial gains over state-of-the-art methods and demonstrate robustness across model variants, with analyses highlighting improved knowledge collaboration and reduced cascading errors. The approach offers a practical, model-agnostic pathway to enhance retrieval-augmented reasoning in real-world QA systems.

Abstract

Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-source knowledge. To address this, we propose Beam Aggregation Reasoning, BeamAggR, a reasoning framework for knowledge-intensive multi-hop QA. BeamAggR explores and prioritizes promising answers at each hop of question. Concretely, we parse the complex questions into trees, which include atom and composite questions, followed by bottom-up reasoning. For atomic questions, the LLM conducts reasoning on multi-source knowledge to get answer candidates. For composite questions, the LLM combines beam candidates, explores multiple reasoning paths through probabilistic aggregation, and prioritizes the most promising trajectory. Extensive experiments on four open-domain multi-hop reasoning datasets show that our method significantly outperforms SOTA methods by 8.5%. Furthermore, our analysis reveals that BeamAggR elicits better knowledge collaboration and answer aggregation.

BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

TL;DR

BeamAggR addresses factual errors in knowledge-intensive multi-hop QA by integrating a divide-and-conquer strategy with multi-source knowledge. It decomposes complex questions into trees, performs bottom-up reasoning using complementary internal and external knowledge, and uses probabilistic beam aggregation to explore and select promising reasoning trajectories. Empirical results across four open-domain datasets show substantial gains over state-of-the-art methods and demonstrate robustness across model variants, with analyses highlighting improved knowledge collaboration and reduced cascading errors. The approach offers a practical, model-agnostic pathway to enhance retrieval-augmented reasoning in real-world QA systems.

Abstract

Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-source knowledge. To address this, we propose Beam Aggregation Reasoning, BeamAggR, a reasoning framework for knowledge-intensive multi-hop QA. BeamAggR explores and prioritizes promising answers at each hop of question. Concretely, we parse the complex questions into trees, which include atom and composite questions, followed by bottom-up reasoning. For atomic questions, the LLM conducts reasoning on multi-source knowledge to get answer candidates. For composite questions, the LLM combines beam candidates, explores multiple reasoning paths through probabilistic aggregation, and prioritizes the most promising trajectory. Extensive experiments on four open-domain multi-hop reasoning datasets show that our method significantly outperforms SOTA methods by 8.5%. Furthermore, our analysis reveals that BeamAggR elicits better knowledge collaboration and answer aggregation.
Paper Structure (54 sections, 4 equations, 15 figures, 7 tables, 1 algorithm)

This paper contains 54 sections, 4 equations, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: A brief overview of our method. Complex questions are decomposed into trees (top). Multi-source beam aggregation reasoning is conducted to find the best reasoning trajectory. (bottom)
  • Figure 2: An overview of BeamAggR. (a) Question decomposition: decompose complex questions into trees and address them bottom-up (b) Multi-source reasoning: reason from diverse knowledge sources and normalize answers into a probability distribution (c) Beam aggregation reasoning: explore based on children's predictions, probabilistic aggregate answers and select the most promising reasoning trajectory.
  • Figure 3: Performance gap with different reasoning steps. We adopt the original split in MuSiQue and report the average f1 score in each subset. As the number of reasoning steps escalates, the model's performance declines. Our method exhibits a slower performance decline as reasoning steps increase, indicating its ability to effectively alleviate cascading errors.
  • Figure 4: Distribution of knowledge integration in reasoning. Unified represents the integration of multi-source knowledge in reasoning, while the two ends represent reliance on single-source knowledge. The bars represent original discrete distributions, and the curve is the kernel density estimate (KDE).
  • Figure 5: The contribution of each reasoning strategy to the final answer in percentage. HotpotQA is balanced, while MuSiQue and 2WikiMQA leans to external knowledge, and Bamboogle favors internal knowledge.
  • ...and 10 more figures