STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering

Zhenyu Bi; Daniel Hajialigol; Zhongkai Sun; Jie Hao; Xuan Wang

STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering

Zhenyu Bi, Daniel Hajialigol, Zhongkai Sun, Jie Hao, Xuan Wang

TL;DR

This work tackles the challenge of complex, multi-hop question answering by introducing STOC-ToT, a stochastic tree-of-thought prompting framework with constrained decoding. The approach builds a tree of sub-questions with estimated path probabilities and grounds final answers using a vocabulary bank derived from evidence, reducing hallucinations via code- or prompt-based constrained decoding. Empirical results on HotpotQA and MuSiQue across five LLMs show substantial improvements over CoT and ToT baselines, with notable gains when constrained decoding is used, and ablations highlight robustness across question and reasoning types. The findings suggest STOC-ToT provides more reliable, diverse, and grounded reasoning for MHQA and related open-domain reasoning tasks, albeit with higher computational costs and reliance on careful sub-question generation.

Abstract

Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts (e.g., chain-of-thought reasoning) for the MHQA task. However, the complexities in the question types (bridge v.s. comparison questions) and the reasoning types (sequential v.s. parallel reasonings) require more novel and fine-grained prompting methods to enhance the performance of MHQA under the zero-shot setting. In this paper, we propose STOC-TOT, a stochastic tree-of-thought reasoning prompting method with constrained decoding for MHQA and conduct a detailed comparison with other reasoning prompts on different question types and reasoning types. Specifically, we construct a tree-like reasoning structure by prompting the model to break down the original question into smaller sub-questions to form different reasoning paths. In addition, we prompt the model to provide a probability estimation for each reasoning path at each reasoning step. At answer time, we conduct constrained decoding on the model to generate more grounded answers and reduce hallucination. Experiments comparing STOC-TOT with two MHQA datasets and five large language models showed that our framework outperforms other reasoning prompts by a significant margin.

STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering

TL;DR

Abstract

Paper Structure (34 sections, 5 figures, 5 tables)

This paper contains 34 sections, 5 figures, 5 tables.

Introduction
Related Work
Multi-Hop Question Answering
Reasoning Prompting of LLMs
Constrained Decoding
Method
Task Formation
StoC-ToT Framework
Example-Based Sub-Question Generation
Paraphrase Detection
Evidence Retrieval and Answering
Validity Estimation
Constrained Decoding
Code-based Constrained Decoding
Prompt-based Constrained Decoding
...and 19 more sections

Figures (5)

Figure 1: An example of the MHQA question. This question has two hops that require the model to reason about before answering the final question.
Figure 2: Overview of our framework, with the example in Figure 1. The top-right Corner shows the overall structure of the constructed tree, with each node's label on the left. Darker green in the nodes means a higher evaluated probability of the reasoning path. The original Question is colored in blue. We chose the first round of our tree-building process as an example in the purple block.
Figure 3: Performace comparison of Chain-of-Thought, Tree-of-Thought, and StoC-ToT on questions of different question types (Left) and reasoning types (Right). Experiments were done on the HotpotQA dataset.
Figure 4: Performance comparison of CoT, ToT, and StoC-ToT on different number of hops in the question. Experiments done in the MusiQue dataset.
Figure 5: Ratio of different categories in error cases, on the HotpotQA dataset.

STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering

TL;DR

Abstract

STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (5)