SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang; Delong Li; Haiyu Deng; Baihe Ma; Xu Wang; Qin Wang; Guangsheng Yu

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, Guangsheng Yu

TL;DR

This paper analyzes the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign.

Abstract

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

TL;DR

Abstract

scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

Paper Structure (58 sections, 1 equation, 5 figures, 7 tables)

This paper contains 58 sections, 1 equation, 5 figures, 7 tables.

Introduction
What Is an Agentic Skill?
Formal Definition
Skills versus Related Abstractions
Skills as Procedural Memory
Methodology
Literature Search and Selection
Corpus and Analysis
Taxonomy Development
Skill Lifecycle Model
Discovery
Practice, Refinement, and Distillation
Storage and Retrieval
Execution and Evaluation
Design Patterns and Taxonomy
...and 43 more sections

Figures (5)

Figure 1: Internal anatomy of an agentic skill. Observations $O$ enter the applicability gate $C$; the policy $\pi$ produces actions $A$; the termination condition $T$ determines whether to continue or halt. The interface $R$ wraps the entire module as a callable API boundary. Goal $G$ is typically encoded in observations $O$ or passed as a separate task parameter; for visual simplicity, we show $O$ as the single input.
Figure 2: The agentic skill lifecycle. Solid arrows indicate the primary forward path; dashed arrows indicate feedback loops for refinement and retirement. Each stage corresponds to a body of research surveyed in this paper.
Figure 3: Seven design patterns for agentic skills arranged along an autonomy spectrum, from human-controlled metadata disclosure (P1) to fully autonomous meta-skills (P6). Marketplace distribution (P7) spans the full spectrum as a cross-cutting distribution mechanism. Dashed lines indicate commonly combined patterns.
Figure 4: Skill composition and orchestration. Tasks are matched to skills via embedding-based retrieval or LLM-mediated routing. Selected skills decompose hierarchically into sub-skills. Dashed arrows indicate failure recovery paths that trigger re-retrieval or alternative skill selection.
Figure 5: Trust-tiered threat model for skill governance. Four nested privilege tiers (T1--T4) form concentric security boundaries. Red arrows show attack vectors targeting different tier boundaries; green labels indicate defense mechanisms between tiers.

Theorems & Definitions (1)

Definition 1: Agentic skills

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

TL;DR

Abstract

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)