Table of Contents
Fetching ...

GenDec: A robust generative Question-decomposition method for Multi-hop reasoning

Jian Wu, Linyi Yang, Yuliang Ji, Wenhao Huang, Börje F. Karlsson, Manabu Okumura

TL;DR

GenDec introduces a generative question decomposition framework that produces independent sub-questions from retrieved paragraphs to enable robust, parallelizable reasoning for multi-hop QA. The framework comprises GenDec for sub-question generation, SPR for sub-question-conditioned paragraph retrieval, and SQA for sub-question–driven QA with supporting-facts prediction. Empirical results show GenDec improves both QA accuracy and retrieval quality across HotpotQA, 2WikiMultiHopQA, MuSiQue, and PokeMQA, and enhances reasoning when used with state-of-the-art LLMs such as GPT-4. The approach reduces error propagation inherent in prior QD+QA and CoT methods and yields competitive or state-of-the-art paragraph retrieval when combined with Beam Retrieval, while revealing areas for further improvement in paragraph selection robustness.

Abstract

Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets.

GenDec: A robust generative Question-decomposition method for Multi-hop reasoning

TL;DR

GenDec introduces a generative question decomposition framework that produces independent sub-questions from retrieved paragraphs to enable robust, parallelizable reasoning for multi-hop QA. The framework comprises GenDec for sub-question generation, SPR for sub-question-conditioned paragraph retrieval, and SQA for sub-question–driven QA with supporting-facts prediction. Empirical results show GenDec improves both QA accuracy and retrieval quality across HotpotQA, 2WikiMultiHopQA, MuSiQue, and PokeMQA, and enhances reasoning when used with state-of-the-art LLMs such as GPT-4. The approach reduces error propagation inherent in prior QD+QA and CoT methods and yields competitive or state-of-the-art paragraph retrieval when combined with Beam Retrieval, while revealing areas for further improvement in paragraph selection robustness.

Abstract

Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets.
Paper Structure (23 sections, 4 equations, 3 figures, 10 tables)

This paper contains 23 sections, 4 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Example of multi-hop and decomposed sub-questions from the HotpotQA dataset. The original question is shown in light grey and the decomposed ones are in deep gray and cyan. "Roberto de Vincenzo" in the retrieved paragraph is the answer to sub-question Q1 and also part of sub-question Q2. The literal "230" is the answer to sub-question Q2. Since the paragraphs are too long, we here only list the sentences that contain supporting facts.
  • Figure 2: Pipeline of GenDec. From top to bottom. We first carry out Question Decomposition (QD) to decompose a multi-hop question into its sub-questions and then train a Sub-question-enhanced Paragraph Retrieval module (SPR). We then input multi-hop questions, sub-questions, as well as retrieved paragraphs, into the sub-question-enhanced QA module to extract the final answers.
  • Figure 3: Prompting examples of different settings.