Table of Contents
Fetching ...

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Yansong Ning, Jun Fang, Naiqiang Tan, Hao Liu

TL;DR

The paper addresses inefficiencies in multi-turn LLM agents caused by redundant internal reasoning and environmental observations. It introduces Agent-Omit, a two-stage framework combining omission-data driven cold-start fine-tuning with omit-aware agentic RL, underpinned by a KL-divergence based bound on policy deviation. The work provides theoretical guarantees via semantic Lipschitz continuity and a bounded omission error theorem, and demonstrates that Agent-Omit-8B achieves performance on par with frontier models while substantially reducing token usage across five diverse benchmarks. Practically, this approach enables smaller models to deliver high accuracy with significantly improved efficiency, by adaptively pruning unnecessary context during interaction. The study lays a groundwork for turn-aware context management in autonomous LLM agents and points to scaling the omission paradigm to larger models and broader tasks.

Abstract

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

TL;DR

The paper addresses inefficiencies in multi-turn LLM agents caused by redundant internal reasoning and environmental observations. It introduces Agent-Omit, a two-stage framework combining omission-data driven cold-start fine-tuning with omit-aware agentic RL, underpinned by a KL-divergence based bound on policy deviation. The work provides theoretical guarantees via semantic Lipschitz continuity and a bounded omission error theorem, and demonstrates that Agent-Omit-8B achieves performance on par with frontier models while substantially reducing token usage across five diverse benchmarks. Practically, this approach enables smaller models to deliver high accuracy with significantly improved efficiency, by adaptively pruning unnecessary context during interaction. The study lays a groundwork for turn-aware context management in autonomous LLM agents and points to scaling the omission paradigm to larger models and broader tasks.

Abstract

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.
Paper Structure (32 sections, 2 theorems, 14 equations, 8 figures, 4 tables)

This paper contains 32 sections, 2 theorems, 14 equations, 8 figures, 4 tables.

Key Result

Lemma 5.1

Assume that the agent's task accuracy $R(y)$ and token cost $C(y)$ are Lipschitz continuous with respect to the semantic distance in the trajectory embedding space:

Figures (8)

  • Figure 1: Illustrative examples of how thought necessity and observation utility varies across turns. (a) Initial planning (e.g., search for Trivor and Muztagh Ata) already determines the subsequent tool call action, making follow-up thought redundant; (b) Observations from early turns are unuseful in the last turn, because only tool response in turn 4 is used for the answer summarization.
  • Figure 2: Quantitative analysis of how thought and observation affect agent efficiency and effectiveness across interaction turns on WebShop environment using Qwen3-8B.
  • Figure 3: The effect of thought and observation omission on agent efficiency and effectiveness across turns on WebShop environment using Qwen3-8B. The grey shaded region represents omitting at a specific turn could decrease token length without sacrificing accuracy.
  • Figure 4: Overview of our proposed framework Agent-Omit.
  • Figure 5: Pass@1 accuracy of Agent-Omit variants on WebShop environment using Qwen3-8B backbone.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Lemma 5.1: Semantic Lipschitz Continuity
  • Theorem 5.2: Bounded Omission Error
  • Definition 1.1: Lipschitz Continuity hager1979lipschitz
  • proof
  • proof