GenDec: A robust generative Question-decomposition method for Multi-hop reasoning
Jian Wu, Linyi Yang, Yuliang Ji, Wenhao Huang, Börje F. Karlsson, Manabu Okumura
TL;DR
GenDec introduces a generative question decomposition framework that produces independent sub-questions from retrieved paragraphs to enable robust, parallelizable reasoning for multi-hop QA. The framework comprises GenDec for sub-question generation, SPR for sub-question-conditioned paragraph retrieval, and SQA for sub-question–driven QA with supporting-facts prediction. Empirical results show GenDec improves both QA accuracy and retrieval quality across HotpotQA, 2WikiMultiHopQA, MuSiQue, and PokeMQA, and enhances reasoning when used with state-of-the-art LLMs such as GPT-4. The approach reduces error propagation inherent in prior QD+QA and CoT methods and yields competitive or state-of-the-art paragraph retrieval when combined with Beam Retrieval, while revealing areas for further improvement in paragraph selection robustness.
Abstract
Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets.
