Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

Uria Franko

Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

Uria Franko

TL;DR

Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools, is proposed, a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools.

Abstract

Large Language Model (LLM) agents often run for many steps while re-ingesting long system instructions and large tool catalogs each turn. This increases cost, agent derailment probability, latency, and tool-selection errors. We propose Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools. ITR composes a dynamic runtime system prompt and exposes a narrowed toolset with confidence-gated fallbacks. Using a controlled benchmark with internally consistent numbers, ITR reduces per-step context tokens by 95%, improves correct tool routing by 32% relative, and cuts end-to-end episode cost by 70% versus a monolithic baseline. These savings enable agents to run 2-20x more loops within context limits. Savings compound with the number of agent steps, making ITR particularly valuable for long-running autonomous agents. We detail the method, evaluation protocol, ablations, and operational guidance for practical deployment.

Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

TL;DR

Abstract

Paper Structure (38 sections, 8 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 38 sections, 8 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Retrieval-Augmented Generation
Tool Learning and Function Calling
Prompt Engineering and Compression
Autonomous Agents and Cost Optimization
Position of Our Work
Method
Problem Setup and Notation
Corpora and Indexing
Retrieval and Scoring
Budget-Aware Selection
System Architecture
Assembly and Safety Overlay
Fallbacks and Confidence Gating
...and 23 more sections

Figures (4)

Figure 1: ITR system architecture showing dual retrieval, budget-aware selection, and confidence-gated fallback mechanisms.
Figure 2: Comprehensive analysis of token accumulation over 10 agent loops. Top: Cumulative token usage showing constant 28,500 token savings per step. Bottom left: Token distribution breakdown at loops 1, 5, and 10. Bottom center: Available context window over loops. Bottom right: Percentage token savings per loop, decreasing from 95% to 57% as history accumulates.
Figure 3: Context token usage scaling with episode length. ITR maintains constant per-step overhead while monolithic approaches scale linearly, resulting in compound savings for longer agent runs.
Figure 4: Extended 20-loop evaluation showing constant 28,500 token savings per step. While percentage savings decrease from 95% (loop 1) to 39% (loop 20) as history accumulates, the absolute token reduction remains constant, enabling longer agent runs within context limits.

Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

TL;DR

Abstract

Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)