Scaling Long-Horizon LLM Agent via Context-Folding

Weiwei Sun; Miao Lu; Zhan Ling; Kang Liu; Xuesong Yao; Yiming Yang; Jiecao Chen

Scaling Long-Horizon LLM Agent via Context-Folding

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen

TL;DR

Long-horizon LLM agents are constrained by context length. Context Folding enables active context management by branching to sub-tasks and folding their intermediate steps, while FoldGRPO learns this behavior with token-level process rewards. On BrowseComp-Plus and SWE-Bench Verified, folding with a 32K active context and up to 10 branches matches or surpasses baselines that use much larger contexts and yields substantial efficiency gains. This work demonstrates that learnable context management is a principled and scalable pathway toward stronger, autonomous long-horizon LLM agents.

Abstract

Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10$\times$ smaller and significantly outperforms models that rely on summarization-based context management.

Scaling Long-Horizon LLM Agent via Context-Folding

TL;DR

Abstract

Scaling Long-Horizon LLM Agent via Context-Folding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)