STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering
Zhenyu Bi, Daniel Hajialigol, Zhongkai Sun, Jie Hao, Xuan Wang
TL;DR
This work tackles the challenge of complex, multi-hop question answering by introducing STOC-ToT, a stochastic tree-of-thought prompting framework with constrained decoding. The approach builds a tree of sub-questions with estimated path probabilities and grounds final answers using a vocabulary bank derived from evidence, reducing hallucinations via code- or prompt-based constrained decoding. Empirical results on HotpotQA and MuSiQue across five LLMs show substantial improvements over CoT and ToT baselines, with notable gains when constrained decoding is used, and ablations highlight robustness across question and reasoning types. The findings suggest STOC-ToT provides more reliable, diverse, and grounded reasoning for MHQA and related open-domain reasoning tasks, albeit with higher computational costs and reliance on careful sub-question generation.
Abstract
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts (e.g., chain-of-thought reasoning) for the MHQA task. However, the complexities in the question types (bridge v.s. comparison questions) and the reasoning types (sequential v.s. parallel reasonings) require more novel and fine-grained prompting methods to enhance the performance of MHQA under the zero-shot setting. In this paper, we propose STOC-TOT, a stochastic tree-of-thought reasoning prompting method with constrained decoding for MHQA and conduct a detailed comparison with other reasoning prompts on different question types and reasoning types. Specifically, we construct a tree-like reasoning structure by prompting the model to break down the original question into smaller sub-questions to form different reasoning paths. In addition, we prompt the model to provide a probability estimation for each reasoning path at each reasoning step. At answer time, we conduct constrained decoding on the model to generate more grounded answers and reduce hallucination. Experiments comparing STOC-TOT with two MHQA datasets and five large language models showed that our framework outperforms other reasoning prompts by a significant margin.
