Table of Contents
Fetching ...

RISE: Reasoning Enhancement via Iterative Self-Exploration in Multi-hop Question Answering

Bolei He, Xinran He, Mengke Chen, Xianwei Xue, Ying Zhu, Zhenhua Ling

TL;DR

RISE introduces a self-driven iterative framework that merges Retrieval-Augmented Generation with self-exploration to tackle multi-hop QA. By cycling through question decomposition, retrieve-then-read, and self-critique, it generates task-specific data and optimizes across multiple objectives to progressively enhance reasoning and evidence integration. Empirical results on MHQA benchmarks show significant improvements in reasoning accuracy and task performance, with ablations confirming the complementary value of each component. The approach also demonstrates robustness across iterations and model variants, while highlighting a cost-efficient path to improved complex reasoning without heavy external supervision.

Abstract

Large Language Models (LLMs) excel in many areas but continue to face challenges with complex reasoning tasks, such as Multi-Hop Question Answering (MHQA). MHQA requires integrating evidence from diverse sources while managing intricate logical dependencies, often leads to errors in reasoning. Retrieval-Augmented Generation (RAG), widely employed in MHQA tasks, faces challenges in effectively filtering noisy data and retrieving all necessary evidence, thereby limiting its effectiveness in addressing MHQA challenges. To address these challenges, we propose RISE:Reasoning Enhancement via Iterative Self-Exploration, a novel framework designed to enhance models' reasoning capability through iterative self-exploration. Specifically, RISE involves three key steps in addressing MHQA tasks: question decomposition, retrieve-then-read, and self-critique. By leveraging continuous self-exploration, RISE identifies accurate reasoning paths, iteratively self-improving the model's capability to integrate evidence, maintain logical consistency, and enhance performance in MHQA tasks. Extensive experiments on multiple MHQA benchmarks demonstrate that RISE significantly improves reasoning accuracy and task performance.

RISE: Reasoning Enhancement via Iterative Self-Exploration in Multi-hop Question Answering

TL;DR

RISE introduces a self-driven iterative framework that merges Retrieval-Augmented Generation with self-exploration to tackle multi-hop QA. By cycling through question decomposition, retrieve-then-read, and self-critique, it generates task-specific data and optimizes across multiple objectives to progressively enhance reasoning and evidence integration. Empirical results on MHQA benchmarks show significant improvements in reasoning accuracy and task performance, with ablations confirming the complementary value of each component. The approach also demonstrates robustness across iterations and model variants, while highlighting a cost-efficient path to improved complex reasoning without heavy external supervision.

Abstract

Large Language Models (LLMs) excel in many areas but continue to face challenges with complex reasoning tasks, such as Multi-Hop Question Answering (MHQA). MHQA requires integrating evidence from diverse sources while managing intricate logical dependencies, often leads to errors in reasoning. Retrieval-Augmented Generation (RAG), widely employed in MHQA tasks, faces challenges in effectively filtering noisy data and retrieving all necessary evidence, thereby limiting its effectiveness in addressing MHQA challenges. To address these challenges, we propose RISE:Reasoning Enhancement via Iterative Self-Exploration, a novel framework designed to enhance models' reasoning capability through iterative self-exploration. Specifically, RISE involves three key steps in addressing MHQA tasks: question decomposition, retrieve-then-read, and self-critique. By leveraging continuous self-exploration, RISE identifies accurate reasoning paths, iteratively self-improving the model's capability to integrate evidence, maintain logical consistency, and enhance performance in MHQA tasks. Extensive experiments on multiple MHQA benchmarks demonstrate that RISE significantly improves reasoning accuracy and task performance.

Paper Structure

This paper contains 30 sections, 5 equations, 16 figures, 12 tables, 1 algorithm.

Figures (16)

  • Figure 1: The upper part of the figure (blue) illustrates an Evidence Aggregation Error, where the Blu-ray release year of Fire Birds (2015) is mistaken for its theatrical release year. The lower part (green and red) shows a Reasoning Decomposition Error. The incorrect path formulates the sub-question as the production year of The Book of Eli (2009) instead of its release year (2010).
  • Figure 2: A complete iteration cycle in RISE. a) Self-Exploration: Model $M^i$ decomposes complex questions $q_0$ into simpler sub-questions, generates sub-answers via retrieve-then-read, and evaluates their validity, leading to a final answer $a_0$. Interactions are stored as historical data $\mathcal{D}$. b) Iterative Optimization: RISE optimizes model $M^i$ using historical data $\mathcal{D}$ to create an enhanced model $M^{i+1}$, which generates new questions $Q^{i+1}$ for the next cycle.
  • Figure 3: Changes in model accuracy (a) and reasoning length (b) across datasets. Accuracy consistently improves across datasets, while reasoning length, despite some fluctuations, shows an overall decreasing trend.
  • Figure 4: Evaluating the win rates between the current and previous iterations using GPT-4o to assess model’s question decomposition capability. Results indicate that each new iteration consistently outperforms the previous one in subjective effectiveness, demonstrating RISE’s continuously enhance the model’s decomposition capability.
  • Figure 5: Changes in the model’s retrieve-then-read capability. (a) Results on simpler datasets (NQ, TriviaQA, WebQ), (b) Results on more complex datasets (2Wiki, HotpotQA, MSQ), where accuracy shows consistent growth with each iteration, even in challenging scenarios.
  • ...and 11 more figures