Scaling Long-Horizon LLM Agent via Context-Folding
Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen
TL;DR
Long-horizon LLM agents are constrained by context length. Context Folding enables active context management by branching to sub-tasks and folding their intermediate steps, while FoldGRPO learns this behavior with token-level process rewards. On BrowseComp-Plus and SWE-Bench Verified, folding with a 32K active context and up to 10 branches matches or surpasses baselines that use much larger contexts and yields substantial efficiency gains. This work demonstrates that learnable context management is a principled and scalable pathway toward stronger, autonomous long-horizon LLM agents.
Abstract
Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10$\times$ smaller and significantly outperforms models that rely on summarization-based context management.
