Table of Contents
Fetching ...

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei

TL;DR

ReCAP addresses the challenge of long-horizon reasoning in LLM agents by introducing a recursive, context-aware planning framework. It maintains a single shared context across recursion depths, enabling plan-ahead decomposition, consistent backtracking, and memory-efficient execution. The approach yields substantial improvements over strong baselines on embodied, knowledge-intensive, and code-editing tasks, with notable gains in long-horizon settings and robustness across models. This work highlights the value of structured context reinjection and bounded memory for scalable, coherent multi-level reasoning in LLM agents.

Abstract

Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles, while hierarchical prompting methods often weaken cross-level continuity or incur substantial runtime overhead. We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in LLMs. ReCAP combines three key mechanisms: (i) plan-ahead decomposition, in which the model generates a full subtask list, executes the first item, and refines the remainder; (ii) structured re-injection of parent plans, maintaining consistent multi-level context during recursive return; and (iii) memory-efficient execution, bounding the active prompt so costs scale linearly with task depth. Together these mechanisms align high-level goals with low-level actions, reduce redundant prompting, and preserve coherent context updates across recursion. Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks, achieving a 32% gain on synchronous Robotouille and a 29% improvement on asynchronous Robotouille under the strict pass@1 protocol.

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

TL;DR

ReCAP addresses the challenge of long-horizon reasoning in LLM agents by introducing a recursive, context-aware planning framework. It maintains a single shared context across recursion depths, enabling plan-ahead decomposition, consistent backtracking, and memory-efficient execution. The approach yields substantial improvements over strong baselines on embodied, knowledge-intensive, and code-editing tasks, with notable gains in long-horizon settings and robustness across models. This work highlights the value of structured context reinjection and bounded memory for scalable, coherent multi-level reasoning in LLM agents.

Abstract

Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles, while hierarchical prompting methods often weaken cross-level continuity or incur substantial runtime overhead. We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in LLMs. ReCAP combines three key mechanisms: (i) plan-ahead decomposition, in which the model generates a full subtask list, executes the first item, and refines the remainder; (ii) structured re-injection of parent plans, maintaining consistent multi-level context during recursive return; and (iii) memory-efficient execution, bounding the active prompt so costs scale linearly with task depth. Together these mechanisms align high-level goals with low-level actions, reduce redundant prompting, and preserve coherent context updates across recursion. Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks, achieving a 32% gain on synchronous Robotouille and a 29% improvement on asynchronous Robotouille under the strict pass@1 protocol.

Paper Structure

This paper contains 54 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: sequential/hierarchical prompting vs. ReCAP
  • Figure 2: Overview of ReCAP's backtracking and refinement
  • Figure 3: Detailed comparison between ReAct and ReCAP's behaviors when encountering blocked stations in Robotouille. Left: ReAct repeatedly alternates between stacking and unstacking the same item, resulting in an infinite loop. Right: ReCAP detects the loop, backtracks to clear the board by moving the blocking lettuce, and then proceeds with the correct sequence of actions.
  • Figure 4: Tool call and cost distributions for ReCAP on SWE-bench Verified.
  • Figure 5: Task resolve rate of ReCAP on SWE-bench Verified, by number of tool calls.