Table of Contents
Fetching ...

Experience-Guided Adaptation of Inference-Time Reasoning Strategies

Adam Stein, Matthew Trager, Benjamin Bowman, Michael Kleinman, Aditya Chattopadhyay, Wei Xia, Stefano Soatto

TL;DR

EGuR presents a novel inference-time meta-strategy that generates complete, problem-specific reasoning procedures by learning from accumulated experience. It separates strategy generation (Guide) from memory-driven improvement (Consolidator), enabling per-instance adaptation of prompts, parameters, tools, and control logic. Across five challenging benchmarks, EGuR achieves significant accuracy gains and dramatic cost reductions, with improvements growing as experience accumulates. The work demonstrates meaningful, learnable heuristics for dynamically choosing when to deploy complex agentic workflows versus lightweight workflows, and how to tailor computation to problem characteristics. This approach offers a practical path to continually improve AI reasoning efficiency and effectiveness without offline retraining.

Abstract

Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.

Experience-Guided Adaptation of Inference-Time Reasoning Strategies

TL;DR

EGuR presents a novel inference-time meta-strategy that generates complete, problem-specific reasoning procedures by learning from accumulated experience. It separates strategy generation (Guide) from memory-driven improvement (Consolidator), enabling per-instance adaptation of prompts, parameters, tools, and control logic. Across five challenging benchmarks, EGuR achieves significant accuracy gains and dramatic cost reductions, with improvements growing as experience accumulates. The work demonstrates meaningful, learnable heuristics for dynamically choosing when to deploy complex agentic workflows versus lightweight workflows, and how to tailor computation to problem characteristics. This approach offers a practical path to continually improve AI reasoning efficiency and effectiveness without offline retraining.

Abstract

Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.

Paper Structure

This paper contains 43 sections, 11 equations, 6 figures, 8 tables, 2 algorithms.

Figures (6)

  • Figure 1: Comparison of existing experience-based adaptation methods (A and B) to our approach in (C) which dynamically produces a compiled strategy based on the current query and memory at inference time. Methods which augment existing strategies with state, such as Dynamic Cheatsheetdynamic-cheat and Mem0mem0, are depicted in (A) and methods such as ADAS adas and OPTO cheng2024trace which optimize strategies offline are shown in (B). Our method shown in (C) uses a state (which can contain useful compiled strategies) during inference to guide the system in producing effective strategies for each query, unlike existing methods which cannot adapt their strategies per-query.
  • Figure 2: High-level grammar for strategies as compositions of base processes (left) with an example of how CodeAct is represented in this grammar (right). A further formalization is provided in \ref{['app:strategies']} where we define how to additionally construct processes from pure functions as well as processes for explicitly accessing and updating the state, and some useful base processes such as $\mathbf{return}$.
  • Figure 3: Comparison of strategy performance across tasks for Claude 3.7 Sonnet. Strategies closer to the top-left corner are best for the task in terms of accuracy and cost. The optimal strategy differs significantly across tasks. For example, Code excels on 3-SAT and Word Sorting but performs poorly on Movie Recommendation and AIME. Descriptions of the above strategies are provided in \ref{['app:common-strats']}.
  • Figure 4: Evolution of accuracy and cost on held-out evaluation sets as training progresses for Claude 3.7 Sonnet. EGuR-5 consistently improves accuracy while reducing cost with more experience. Cost is shown up to $1.0 for visualization; Dynamic Cheatsheet (DC) typically exceeds this threshold, reaching $9.95, $2.26, $2.88, $4.32, and $7.16 per sample after training on 3-SAT, AIME, Movie Rec., Word Sort., and Object Count., respectively. The complete results for Claude 3.7 Sonnet, GPT-OSS-120B, and Qwen3-Next-80B-A3B-Thinking are included in \ref{['app:full-results']}.
  • Figure 5: Ablation of exploration level in EGuR for Claude 3.7 Sonnet. Higher exploration levels (more strategies per problem) generally improve both accuracy and cost-efficiency by enabling comparative evaluation of strategy effectiveness.
  • ...and 1 more figures