Table of Contents
Fetching ...

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

Youngbin Choi, Minjong Lee, Saemi Moon, Seunghyuk Cho, Chaehyeon Chung, MoonJeong Park, Dongwoo Kim

TL;DR

This work introduces in-place feedback, a state-repair paradigm for guiding LLMs in multi-turn reasoning by allowing users to directly edit the model's previous output and continuing generation from the edited state. It identifies three failure modes of traditional multi-turn refinement and demonstrates that in-place feedback improves task accuracy while reducing token usage by about 79.1% across GPQA, MMLU-pro, and MATH-hard benchmarks. Through controlled ZebraLogic experiments, the authors show that in-place edits better preserve correct reasoning, sustain feedback incorporation over turns, and limit error propagation. The results suggest that in-place feedback is a more natural, efficient, and scalable mechanism for guiding LLMs in reasoning-intensive tasks with wide potential applicability.

Abstract

Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this work, we introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response, and the model conditions on this modified response to generate its revision. Empirical evaluations on diverse reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback while using $79.1\%$ fewer tokens. Complementary analyses on controlled environments further demonstrate that in-place feedback resolves a core limitation of multi-turn feedback: models often fail to apply feedback precisely to erroneous parts of the response, leaving errors uncorrected and sometimes introducing new mistakes into previously correct content. These findings suggest that in-place feedback offers a more natural and effective mechanism for guiding LLMs in reasoning-intensive tasks.

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

TL;DR

This work introduces in-place feedback, a state-repair paradigm for guiding LLMs in multi-turn reasoning by allowing users to directly edit the model's previous output and continuing generation from the edited state. It identifies three failure modes of traditional multi-turn refinement and demonstrates that in-place feedback improves task accuracy while reducing token usage by about 79.1% across GPQA, MMLU-pro, and MATH-hard benchmarks. Through controlled ZebraLogic experiments, the authors show that in-place edits better preserve correct reasoning, sustain feedback incorporation over turns, and limit error propagation. The results suggest that in-place feedback is a more natural, efficient, and scalable mechanism for guiding LLMs in reasoning-intensive tasks with wide potential applicability.

Abstract

Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this work, we introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response, and the model conditions on this modified response to generate its revision. Empirical evaluations on diverse reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback while using fewer tokens. Complementary analyses on controlled environments further demonstrate that in-place feedback resolves a core limitation of multi-turn feedback: models often fail to apply feedback precisely to erroneous parts of the response, leaving errors uncorrected and sometimes introducing new mistakes into previously correct content. These findings suggest that in-place feedback offers a more natural and effective mechanism for guiding LLMs in reasoning-intensive tasks.

Paper Structure

This paper contains 54 sections, 5 equations, 23 figures.

Figures (23)

  • Figure 1: Illustration of common failure cases in multi-turn refinement and in-place feedback. After in-place feedback, the LLM continues generation from the green word "requires".
  • Figure 2: Representative examples of in-place feedback on a toy problem. Red marks incorrect reasoning, blue indicates the user corrections with in-place feedback, and green shows the subsequent reasoning based on the corrected context. Additional examples are provided in \ref{['app:in_place']}.
  • Figure 3: Comparison of in-place and multi-turn accuracies across models in MATH-hard, MMLU-pro, and GPQA. Across all datasets and LLM models, our in-place feedback approach consistently outperforms the multi-turn based feedback approach.
  • Figure 4: Number of input and generated tokens across multiple turns. In-place feedback consistently requires fewer tokens than multi-turn feedback across all datasets and LLMs.
  • Figure 5: Grid and cell accuracy of LLMs on the Zebralogic dataset. Across both top-2 and top-4 feedback settings, in-place feedback consistently outperforms multi-turn feedback.
  • ...and 18 more figures