Table of Contents
Fetching ...

SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

Yudong Gao, Zongjie Li, Yuanyuanyuan, Zimo Ji, Pingchuan Ma, Shuai Wang

Abstract

LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 publicly available skills and find systemic inefficiencies: 26.4\% lack routing descriptions entirely, over 60\% of body content is non-actionable, and reference files can inject tens of thousands of tokens per invocation. Motivated by these findings, we present \textsc{SkillReducer}, a two-stage optimization framework. Stage~1 optimizes the routing layer by compressing verbose descriptions and generating missing ones via adversarial delta debugging. Stage~2 restructures skill bodies through taxonomy-driven classification and progressive disclosure, separating actionable core rules from supplementary content loaded on demand, validated by faithfulness checks and a self-correcting feedback loop. Evaluated on 600 skills and the SkillsBench benchmark, \textsc{SkillReducer} achieves 48\% description compression and 39\% body compression while improving functional quality by 2.8\%, revealing a \emph{less-is-more} effect where removing non-essential content reduces distraction in the context window. These benefits transfer across five models from four families with a mean retention of 0.965, and generalize to an independent agent framework.

SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

Abstract

LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 publicly available skills and find systemic inefficiencies: 26.4\% lack routing descriptions entirely, over 60\% of body content is non-actionable, and reference files can inject tens of thousands of tokens per invocation. Motivated by these findings, we present \textsc{SkillReducer}, a two-stage optimization framework. Stage~1 optimizes the routing layer by compressing verbose descriptions and generating missing ones via adversarial delta debugging. Stage~2 restructures skill bodies through taxonomy-driven classification and progressive disclosure, separating actionable core rules from supplementary content loaded on demand, validated by faithfulness checks and a self-correcting feedback loop. Evaluated on 600 skills and the SkillsBench benchmark, \textsc{SkillReducer} achieves 48\% description compression and 39\% body compression while improving functional quality by 2.8\%, revealing a \emph{less-is-more} effect where removing non-essential content reduces distraction in the context window. These benefits transfer across five models from four families with a mean retention of 0.965, and generalize to an independent agent framework.

Paper Structure

This paper contains 28 sections, 2 theorems, 6 equations, 5 figures, 10 tables, 2 algorithms.

Key Result

Proposition 1

Let $\mathcal{I} = \{x_1, \ldots, x_n\}$ be the set of all items in a skill body, and let $I_{\textup{core}}^{(0)} \subseteq \mathcal{I}$ be the initial core set produced by the classifier. At each feedback iteration $i$, the promotion operator $\mathcal{P}$ adds items from $\mathcal{I} \setminus I_

Figures (5)

  • Figure 1: Token-length distributions of descriptions and skill bodies across three sources.
  • Figure 2: UMAP projection of skill body items with GMM clustering ($k=5$). Five content clusters emerge, suggesting that body content naturally separates into the taxonomy categories (silhouette = 0.393).
  • Figure 3: Overview of SkillReducer. Stage 1 (top) compresses routing descriptions via simulated-oracle-driven delta debugging followed by real-environment validation. Stage 2 (bottom) classifies body content, applies type-specific compression, deduplicates references, and validates through faithfulness and task-based quality gates with a feedback loop.
  • Figure 4: Token reduction achieved by SkillReducer.
  • Figure 5: GMM model selection. $k=5$ is the first local peak (silhouette = 0.393), matching the five-category taxonomy.

Theorems & Definitions (4)

  • Proposition 1: Monotone Convergence of Gate 2 Feedback
  • proof
  • Proposition 2: Expected Invocation Cost
  • proof