A Study on Leveraging Search and Self-Feedback for Agent Reasoning

Karthikeyan K; Michelle Yuan; Elman Mansimov; Katerina Margatina; Anurag Pratik; Daniele Bonadiman; Monica Sunkara; Yi Zhang; Yassine Benajiba

A Study on Leveraging Search and Self-Feedback for Agent Reasoning

Karthikeyan K, Michelle Yuan, Elman Mansimov, Katerina Margatina, Anurag Pratik, Daniele Bonadiman, Monica Sunkara, Yi Zhang, Yassine Benajiba

TL;DR

This study interrogates whether model-generated self-feedback can effectively guide search for agent reasoning, contrasting it with ground-truth feedback across GSM8K math problems and ToolTalk tool-calling. Using Monte Carlo Tree Search (MCTS) with the UCT criterion $UCT(a) = Q(s, a) + w\sqrt{ \frac{\log N(s)}{\log N(c(s, a))} }$, the authors evaluate performance with and without ground-truth signals and analyze various selection strategies. They find that ground-truth feedback yields reliable performance gains, while self-feedback alone is inconsistent; majority voting across nodes improves robustness in math reasoning when GT is unavailable, but self-feedback can degrade performance in tool-calling unless augmented. The results underscore the need for domain-specific feedback mechanisms or hybrid verification approaches to achieve robust, autonomous reasoning across tasks.

Abstract

Recent works have demonstrated that incorporating search during inference can significantly improve reasoning capabilities of language agents. Some approaches may make use of the ground truth or rely on model's own generated feedback. The search algorithm uses this feedback to then produce values that will update its criterion for exploring and exploiting various reasoning paths. In this study, we investigate how search and model's self-feedback can be leveraged for reasoning tasks. First, we explore differences in ground-truth feedback and self-feedback during search for math reasoning. Second, we observe limitations in applying search techniques to more complex tasks like tool-calling and design domain-specific approaches to address these gaps. Our experiments reveal challenges related to generalization when solely relying on self-feedback during search. For search to work effectively, either access to the ground-truth is needed or feedback mechanisms need to be carefully designed for the specific task.

A Study on Leveraging Search and Self-Feedback for Agent Reasoning

TL;DR

Abstract

A Study on Leveraging Search and Self-Feedback for Agent Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents