Table of Contents
Fetching ...

ContextBudget: Budget-Aware Context Management for Long-Horizon Search Agents

Yong Wu, YanZhao Zheng, TianZe Xu, ZhenTao Zhang, YuanQiang Yu, JiHuai Zhu, Chao Ma, BinBin Lin, BaoHua Dong, HangCheng Zhu, RuoHui Huang, Gang Yu

Abstract

LLM-based agents show strong potential for long-horizon reasoning, yet their context size is limited by deployment factors (e.g., memory, latency, and cost), yielding a constrained context budget. As interaction histories grow, this induces a trade-off between retaining past information and staying within the context limit. To address this challenge, we propose Budget-Aware Context Management (BACM), which formulates context management as a sequential decision problem with a context budget constraint. It enables agents to assess the available budget before incorporating new observations and decide when and how much of the interaction history to compress. We further develop BACM-RL, an end-to-end curriculum-based reinforcement learning approach that learns compression strategies under varying context budgets. Experiments on compositional multi-objective QA and long-horizon web browsing benchmarks show that BACM-RL consistently outperforms prior methods across model scales and task complexities, achieving over $1.6\times$ gains over strong baselines in high-complexity settings, while maintaining strong advantages as budgets shrink, where most methods exhibit a downward performance trend.

ContextBudget: Budget-Aware Context Management for Long-Horizon Search Agents

Abstract

LLM-based agents show strong potential for long-horizon reasoning, yet their context size is limited by deployment factors (e.g., memory, latency, and cost), yielding a constrained context budget. As interaction histories grow, this induces a trade-off between retaining past information and staying within the context limit. To address this challenge, we propose Budget-Aware Context Management (BACM), which formulates context management as a sequential decision problem with a context budget constraint. It enables agents to assess the available budget before incorporating new observations and decide when and how much of the interaction history to compress. We further develop BACM-RL, an end-to-end curriculum-based reinforcement learning approach that learns compression strategies under varying context budgets. Experiments on compositional multi-objective QA and long-horizon web browsing benchmarks show that BACM-RL consistently outperforms prior methods across model scales and task complexities, achieving over gains over strong baselines in high-complexity settings, while maintaining strong advantages as budgets shrink, where most methods exhibit a downward performance trend.

Paper Structure

This paper contains 34 sections, 6 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of the proposed framework. (a) The agent first observes the budget-conditioned state $b_t=(s_t,r_t,|o_t|)$ before loading the pending observation. (b) Conditioned on $b_t$, the policy selects a refinement action $u_t$ to perform Null, Partial, or Full commit-block aggregation, yielding an updated context $\mathcal{C}'_t$. (c) The policy is trained with multi-turn GRPO under a progressively tightened budget curriculum, where only trajectories satisfying the context budget contribute reward and optimization.
  • Figure 2: Performance under different maximum context window sizes (16k-4k tokens) with varying numbers of objectives.
  • Figure 3: Cumulative F1 and average compression calls under a fixed 8k context budget.
  • Figure 4: Ablation of key components across context budgets (16k–4k tokens), measured by summed F1 across objectives. Removing budget metadata (B) degrades performance (Ours w/o B), while adding B alone to a baseline without context management does not improve performance (Search-R1 w/ B). Our full model (Ours Full), which conditions compression on B, achieves the best performance across all objectives and budgets.
  • Figure 5: Ablation of progressive budget curricula under a common 8k evaluation budget summed F1 over objectives
  • ...and 3 more figures