Table of Contents
Fetching ...

A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models

Hejing Cao, Zhenwei An, Jiazhan Feng, Kun Xu, Liwei Chen, Dongyan Zhao

TL;DR

The paper tackles hallucinations in large language model QA by introducing Decompose-and-Query (D&Q), a constrained multi-stage reasoning framework that uses external knowledge via a reliable QA base. It combines dataset development (ChitChatQA) with supervised fine-tuning and tool-augmented retrieval to enable backtracking and robust sub-question decomposition. Empirical results show D&Q achieving competitive performance on ChitChatQA and a strong 59.6% F1 on HotPotQA in a question-only setting, along with improved retrieval recall (up to 68.8%). The work contributes a practical, extendable approach for reliable, tool-assisted QA and releases dataset and code to foster further research.

Abstract

While large language models exhibit remarkable performance in the Question Answering task, they are susceptible to hallucinations. Challenges arise when these models grapple with understanding multi-hop relations in complex questions or lack the necessary knowledge for a comprehensive response. To address this issue, we introduce the "Decompose-and-Query" framework (D&Q). This framework guides the model to think and utilize external knowledge similar to ReAct, while also restricting its thinking to reliable information, effectively mitigating the risk of hallucinations. Experiments confirm the effectiveness of D&Q: On our ChitChatQA dataset, D&Q does not lose to ChatGPT in 67% of cases; on the HotPotQA question-only setting, D&Q achieved an F1 score of 59.6%. Our code is available at https://github.com/alkaidpku/DQ-ToolQA.

A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models

TL;DR

The paper tackles hallucinations in large language model QA by introducing Decompose-and-Query (D&Q), a constrained multi-stage reasoning framework that uses external knowledge via a reliable QA base. It combines dataset development (ChitChatQA) with supervised fine-tuning and tool-augmented retrieval to enable backtracking and robust sub-question decomposition. Empirical results show D&Q achieving competitive performance on ChitChatQA and a strong 59.6% F1 on HotPotQA in a question-only setting, along with improved retrieval recall (up to 68.8%). The work contributes a practical, extendable approach for reliable, tool-assisted QA and releases dataset and code to foster further research.

Abstract

While large language models exhibit remarkable performance in the Question Answering task, they are susceptible to hallucinations. Challenges arise when these models grapple with understanding multi-hop relations in complex questions or lack the necessary knowledge for a comprehensive response. To address this issue, we introduce the "Decompose-and-Query" framework (D&Q). This framework guides the model to think and utilize external knowledge similar to ReAct, while also restricting its thinking to reliable information, effectively mitigating the risk of hallucinations. Experiments confirm the effectiveness of D&Q: On our ChitChatQA dataset, D&Q does not lose to ChatGPT in 67% of cases; on the HotPotQA question-only setting, D&Q achieved an F1 score of 59.6%. Our code is available at https://github.com/alkaidpku/DQ-ToolQA.
Paper Structure (22 sections, 2 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Example of question decomposition trajectory. Q: Question; I: Iteration. Dashed line means rollback. We use Ques_Ret for QuestionRetriever and Ans_Ret for AnswerRetriever for brevity.
  • Figure 2: Model architecture of HotPotQA.