Table of Contents
Fetching ...

Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models

Seongmin Lee, Jaewook Shin, Youngjin Ahn, Seokin Seo, Ohjoon Kwon, Kee-Eung Kim

TL;DR

This paper introduces Monte-Carlo tree search for Zero-shot multi-hop multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) to identify optimal reasoning paths in MHQA tasks, mitigating the error propagation from sequential reasoning processes.

Abstract

Recent advances in large language models (LLMs) have significantly impacted the domain of multi-hop question answering (MHQA), where systems are required to aggregate information and infer answers from disparate pieces of text. However, the autoregressive nature of LLMs inherently poses a challenge as errors may accumulate if mistakes are made in the intermediate reasoning steps. This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) to identify optimal reasoning paths in MHQA tasks, mitigating the error propagation from sequential reasoning processes. Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise. We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance. The efficacy of our method is validated on standard benchmarks such as HotpotQA, 2WikiMultihopQA, and MuSiQue, demonstrating that it outperforms existing frameworks.

Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models

TL;DR

This paper introduces Monte-Carlo tree search for Zero-shot multi-hop multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) to identify optimal reasoning paths in MHQA tasks, mitigating the error propagation from sequential reasoning processes.

Abstract

Recent advances in large language models (LLMs) have significantly impacted the domain of multi-hop question answering (MHQA), where systems are required to aggregate information and infer answers from disparate pieces of text. However, the autoregressive nature of LLMs inherently poses a challenge as errors may accumulate if mistakes are made in the intermediate reasoning steps. This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) to identify optimal reasoning paths in MHQA tasks, mitigating the error propagation from sequential reasoning processes. Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise. We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance. The efficacy of our method is validated on standard benchmarks such as HotpotQA, 2WikiMultihopQA, and MuSiQue, demonstrating that it outperforms existing frameworks.
Paper Structure (70 sections, 2 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 70 sections, 2 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA).
  • Figure 2: Step-by-step visualization of a single MCTS iteration for multi-hop question answering task.
  • Figure 3: Performance progression of MZQA over iterations averaged across 3 different seeds.
  • Figure 4: The F1 score with varying numbers of in-context examples (the shaded areas indicate min/max intervals).
  • Figure 5: The compute-performance relationship between the average number of tokens required to generate the final answer to the goal question (x-axis) and the average F1 score (y-axis) over 3 seeds on each benchmark. The number written on each data point indicates the number of in-context learning examples. Note that the closer the point is to the upper-left corner, the more efficient the method is.
  • ...and 1 more figures