Table of Contents
Fetching ...

RoT: Enhancing Large Language Models with Reflection on Search Trees

Wenyang Hui, Kewei Tu

TL;DR

RoT addresses the lack of learning from past search experiences in tree-search-based prompting for large language models. By selecting important states and summarizing per-state guidelines through a strong LLM, RoT provides task-specific reflections that improve decision making and value estimation for subsequent searches, using BFS and MCTS across diverse tasks. The approach yields notable gains in Blocksworld, GSM8k, and CraigslistBargain, and also benefits non-tree methods like CoT by supplying task knowledge derived from search histories. While requiring capable LLMs and reliable value estimation, RoT offers a practical framework for increasing reasoning and planning efficiency in complex, multi-step tasks, particularly when models encounter unfamiliar problems.

Abstract

Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.

RoT: Enhancing Large Language Models with Reflection on Search Trees

TL;DR

RoT addresses the lack of learning from past search experiences in tree-search-based prompting for large language models. By selecting important states and summarizing per-state guidelines through a strong LLM, RoT provides task-specific reflections that improve decision making and value estimation for subsequent searches, using BFS and MCTS across diverse tasks. The approach yields notable gains in Blocksworld, GSM8k, and CraigslistBargain, and also benefits non-tree methods like CoT by supplying task knowledge derived from search histories. While requiring capable LLMs and reliable value estimation, RoT offers a practical framework for increasing reasoning and planning efficiency in complex, multi-step tasks, particularly when models encounter unfamiliar problems.

Abstract

Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.
Paper Structure (30 sections, 1 equation, 10 figures, 11 tables)

This paper contains 30 sections, 1 equation, 10 figures, 11 tables.

Figures (10)

  • Figure 1: An illustration about tree-search-based prompting method in Blocksworld. $a_i$ and $s_i$ denotes action and state at depth $i$. $v$ is the estimated value of an action by the tree search algorithm (value estimation in BFS, and average estimated value of children in MCTS).
  • Figure 2: The RoT framework
  • Figure 3: Important State Selection ($\lambda=0.1$). States marked are selected.
  • Figure 4: Guideline Summarization.
  • Figure 4: AUC of Blocksworld.
  • ...and 5 more figures