Learning to Reason from Feedback at Test-Time

Yanyang Li; Michael Lyu; Liwei Wang

Learning to Reason from Feedback at Test-Time

Yanyang Li, Michael Lyu, Liwei Wang

TL;DR

The paper tackles the challenge of exploiting test-time feedback for complex reasoning tasks by introducing Feedback-based Test-Time Training (FTTT), which stores knowledge in model weights and uses a binary verifier plus optional self-reflection to guide learning. It couples FTTT with OpTune, a lightweight gradient-space optimizer that predicts weight updates from recent attempts, enabling scalable test-time optimization with minimal parameter overhead. Empirical results across math and coding datasets show that FTTT improves test-time scalability and, when integrated with OpTune, outperforms common PEFT baselines while maintaining efficiency. The work advances practical, memory-efficient test-time adaptation for large language models, with potential extensions to continuous feedback settings in future work.

Abstract

Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.

Learning to Reason from Feedback at Test-Time

TL;DR

Abstract

Learning to Reason from Feedback at Test-Time

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)