Table of Contents
Fetching ...

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, Saravan Rajmohan

TL;DR

ACON addresses the context explosion in long-horizon LLM agents by introducing a universal, gradient-free compression framework for both history and observations. It optimizes compression guidelines via failure-driven, natural-language prompts and further distills compressors into smaller models to reduce overhead. Across AppWorld, OfficeBench, and 8-objective QA, ACON achieves significant peak token reductions while largely preserving task performance, and enables smaller LMs to close the gap to larger models. The approach combines strong empirical results with practical deployment advantages, including model-agnosticity and an efficient distillation pathway.

Abstract

Large language models (LLMs) are increasingly deployed as agents in dynamic, real-world environments, where success requires both reasoning and effective tool use. A central challenge for agentic tasks is the growing context length, as agents must accumulate long histories of actions and observations. This expansion raises costs and reduces efficiency in long-horizon tasks, yet prior work on context compression has mostly focused on single-step tasks or narrow applications. We introduce Agent Context Optimization (ACON), a unified framework that optimally compresses both environment observations and interaction histories into concise yet informative condensations. ACON leverages compression guideline optimization in natural language space: given paired trajectories where full context succeeds but compressed context fails, capable LLMs analyze the causes of failure, and the compression guideline is updated accordingly. Furthermore, we propose distilling the optimized LLM compressor into smaller models to reduce the overhead of the additional module. Experiments on AppWorld, OfficeBench, and Multi-objective QA show that ACON reduces memory usage by 26-54% (peak tokens) while largely preserving task performance, preserves over 95% of accuracy when distilled into smaller compressors, and enhances smaller LMs as long-horizon agents with up to 46% performance improvement. Our code is available at https://github.com/microsoft/acon.

ACON: Optimizing Context Compression for Long-horizon LLM Agents

TL;DR

ACON addresses the context explosion in long-horizon LLM agents by introducing a universal, gradient-free compression framework for both history and observations. It optimizes compression guidelines via failure-driven, natural-language prompts and further distills compressors into smaller models to reduce overhead. Across AppWorld, OfficeBench, and 8-objective QA, ACON achieves significant peak token reductions while largely preserving task performance, and enables smaller LMs to close the gap to larger models. The approach combines strong empirical results with practical deployment advantages, including model-agnosticity and an efficient distillation pathway.

Abstract

Large language models (LLMs) are increasingly deployed as agents in dynamic, real-world environments, where success requires both reasoning and effective tool use. A central challenge for agentic tasks is the growing context length, as agents must accumulate long histories of actions and observations. This expansion raises costs and reduces efficiency in long-horizon tasks, yet prior work on context compression has mostly focused on single-step tasks or narrow applications. We introduce Agent Context Optimization (ACON), a unified framework that optimally compresses both environment observations and interaction histories into concise yet informative condensations. ACON leverages compression guideline optimization in natural language space: given paired trajectories where full context succeeds but compressed context fails, capable LLMs analyze the causes of failure, and the compression guideline is updated accordingly. Furthermore, we propose distilling the optimized LLM compressor into smaller models to reduce the overhead of the additional module. Experiments on AppWorld, OfficeBench, and Multi-objective QA show that ACON reduces memory usage by 26-54% (peak tokens) while largely preserving task performance, preserves over 95% of accuracy when distilled into smaller compressors, and enhances smaller LMs as long-horizon agents with up to 46% performance improvement. Our code is available at https://github.com/microsoft/acon.

Paper Structure

This paper contains 58 sections, 11 equations, 8 figures, 11 tables, 1 algorithm.

Figures (8)

  • Figure 1: Accuracy–Peak tokens trade-off on AppWorld appworld. We compare average accuracy versus peak input tokens in history compression. Acon (ours) reduces cost while preserving accuracy for the large model (gpt-4.1) relative to a naive prompting baseline, and even improves accuracy on smaller models (gpt-4.1-mini and Qwen-14B). More results are in \ref{['sec:exp']}.
  • Figure 2: Motivation: Unbounded context in LLM agents. As LLM agents interact with environments, actions and observations continuously accumulate, leading to ever-growing contexts that incur high memory usage as in the red line on the right plot. This motivates Agent Context Optimization (Acon), which optimally compresses histories and observations into concise summaries, reducing peak tokens and memory as in the blue line on the right plot.
  • Figure 3: Compression Guideline Optimization. Feedback is generated by contrasting successful trajectories (no compression) with failed ones (with compression). The collected feedback is then used by LLM to refine the compression guidelines.
  • Figure 4: Results of distilled compressors on history compression with gpt-4.1 as the agent. Student models (Qwen3-14B, Qwen3-8B, Phi-4) are distilled from gpt-4.1 compressor using the optimized compression guideline after $\overline{\underline{\textsc{ut}}}$ step, and evaluated across all benchmarks. We also include result with gpt-4.1-mini without distillation for comparison.
  • Figure 5: Performance-efficiency trade-off of the Qwen3-14B agent distilled from gpt-4.1 trajectories. For distilled compressors, we use the same distillation setting as in \ref{['fig:exp:history_distill']}. Compared to the baseline without compression, our framework Acon provides compressed trajectories combined with a distilled compressor, enabling the distilled agent to achieve consistently higher accuracy while requiring substantially fewer peak input tokens across all benchmarks.
  • ...and 3 more figures