Table of Contents
Fetching ...

Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

Uria Franko

TL;DR

Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools, is proposed, a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools.

Abstract

Large Language Model (LLM) agents often run for many steps while re-ingesting long system instructions and large tool catalogs each turn. This increases cost, agent derailment probability, latency, and tool-selection errors. We propose Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools. ITR composes a dynamic runtime system prompt and exposes a narrowed toolset with confidence-gated fallbacks. Using a controlled benchmark with internally consistent numbers, ITR reduces per-step context tokens by 95%, improves correct tool routing by 32% relative, and cuts end-to-end episode cost by 70% versus a monolithic baseline. These savings enable agents to run 2-20x more loops within context limits. Savings compound with the number of agent steps, making ITR particularly valuable for long-running autonomous agents. We detail the method, evaluation protocol, ablations, and operational guidance for practical deployment.

Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

TL;DR

Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools, is proposed, a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools.

Abstract

Large Language Model (LLM) agents often run for many steps while re-ingesting long system instructions and large tool catalogs each turn. This increases cost, agent derailment probability, latency, and tool-selection errors. We propose Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools. ITR composes a dynamic runtime system prompt and exposes a narrowed toolset with confidence-gated fallbacks. Using a controlled benchmark with internally consistent numbers, ITR reduces per-step context tokens by 95%, improves correct tool routing by 32% relative, and cuts end-to-end episode cost by 70% versus a monolithic baseline. These savings enable agents to run 2-20x more loops within context limits. Savings compound with the number of agent steps, making ITR particularly valuable for long-running autonomous agents. We detail the method, evaluation protocol, ablations, and operational guidance for practical deployment.
Paper Structure (38 sections, 8 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 38 sections, 8 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: ITR system architecture showing dual retrieval, budget-aware selection, and confidence-gated fallback mechanisms.
  • Figure 2: Comprehensive analysis of token accumulation over 10 agent loops. Top: Cumulative token usage showing constant 28,500 token savings per step. Bottom left: Token distribution breakdown at loops 1, 5, and 10. Bottom center: Available context window over loops. Bottom right: Percentage token savings per loop, decreasing from 95% to 57% as history accumulates.
  • Figure 3: Context token usage scaling with episode length. ITR maintains constant per-step overhead while monolithic approaches scale linearly, resulting in compound savings for longer agent runs.
  • Figure 4: Extended 20-loop evaluation showing constant 28,500 token savings per step. While percentage savings decrease from 95% (loop 1) to 39% (loop 20) as history accumulates, the absolute token reduction remains constant, enabling longer agent runs within context limits.