Table of Contents
Fetching ...

ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA

Xinjie Zhao, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, Irene Li

TL;DR

ReAgent introduces a reversible multi-agent framework for knowledge-enhanced multi-hop QA that integrates explicit local and global backtracking to mitigate error propagation in forward reasoning. The architecture couples Execution, Supervisory, and Interaction layers with specialized agents for decomposition, retrieval, verification, and assembly, and coordinates backtracking through a Verifier, Controller, and Supervisor to resolve intra- and inter-agent contradictions. Empirical results on HotpotQA, 2WikiMultiHopQA, and Musique show ~6% average improvements over strong baselines, along with improved interpretability due to traceable backtracking. The work advances robust, error-tolerant QA by enabling corrections mid-reasoning and providing a foundation for scalable, trustworthy collaborative AI systems.

Abstract

Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6\% improvements against baseline models.

ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA

TL;DR

ReAgent introduces a reversible multi-agent framework for knowledge-enhanced multi-hop QA that integrates explicit local and global backtracking to mitigate error propagation in forward reasoning. The architecture couples Execution, Supervisory, and Interaction layers with specialized agents for decomposition, retrieval, verification, and assembly, and coordinates backtracking through a Verifier, Controller, and Supervisor to resolve intra- and inter-agent contradictions. Empirical results on HotpotQA, 2WikiMultiHopQA, and Musique show ~6% average improvements over strong baselines, along with improved interpretability due to traceable backtracking. The work advances robust, error-tolerant QA by enabling corrections mid-reasoning and providing a foundation for scalable, trustworthy collaborative AI systems.

Abstract

Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6\% improvements against baseline models.

Paper Structure

This paper contains 35 sections, 2 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: Comparison of multi-hop reasoning strategies. Chain-of-Thought (CoT) and Multi-Agent Systems (MAS) typically adopt a forward-driven reasoning pipeline without rollback mechanisms, which could generate the wrong answer due to error accumulation. In contrast, our proposed ReAgent introduces explicit backtracking mechanisms that enable the system to correct errors during reasoning, resulting in a more accurate and reliable answer.
  • Figure 2: The overall architecture of the ReAgent. The given question is processed through the Execution Layer, which involves question decomposition, evidence retrieval, verification, and is ultimately integrated to generate the final answer (blue line). The Supervisor Layer and Interaction Layer are responsible for monitoring, regulation, and communication. The ReAgent framework includes both local and global backtracking mechanisms (red boxes), triggered by the Verifier Agent ($A_V$) and Supervisor Agent ($A_S$), respectively.
  • Figure 3: Ablation Study on Local and Global Backtracking (BT): EM comparison on HotpotQA using GPT-4o and DeepSeek-V3.
  • Figure 4: Ablation Study on Backtracking Depth (left) and Number of Decomposed Sub-Questions (right): EM comparison on HotpotQA using GPT-4o and DeepSeek-V3.
  • Figure 5: Case study comparing GPT-O3 (left), CoA (middle), and ReAgent (right) on HotpotQA. The back-tracking mechanism enhances iterative reasoning, enabling conflict detection and correction to reach the correct answer.
  • ...and 4 more figures