Table of Contents
Fetching ...

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Renjun Xu, Yang Yan

TL;DR

Agent skills address the core tension between generalist LLMs and domain-specific procedural expertise by packaging knowledge as portable, load-on-demand modules (SKILL.md, scripts, and assets) that can be dynamically loaded. The work analyzes architectural foundations, acquisition methods, deployment in the computer-use agent (CUA) stack, and security, and introduces a governance framework to map provenance to graduated deployment permissions. Key contributions include a taxonomy of skill acquisition (human-authored, RL-based, autonomous discovery, structured bases, compositional synthesis), empirical security analyses, and a vision for a Skill Trust and Lifecycle Governance Framework. The findings highlight rapid ecosystem maturation but also underscore substantial challenges in portability, scaling, verification, and safe governance, signaling a principled path forward for trustworthy, self-improving skill ecosystems.

Abstract

The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL.md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries (SAGE), autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1\% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills.

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

TL;DR

Agent skills address the core tension between generalist LLMs and domain-specific procedural expertise by packaging knowledge as portable, load-on-demand modules (SKILL.md, scripts, and assets) that can be dynamically loaded. The work analyzes architectural foundations, acquisition methods, deployment in the computer-use agent (CUA) stack, and security, and introduces a governance framework to map provenance to graduated deployment permissions. Key contributions include a taxonomy of skill acquisition (human-authored, RL-based, autonomous discovery, structured bases, compositional synthesis), empirical security analyses, and a vision for a Skill Trust and Lifecycle Governance Framework. The findings highlight rapid ecosystem maturation but also underscore substantial challenges in portability, scaling, verification, and safe governance, signaling a principled path forward for trustworthy, self-improving skill ecosystems.

Abstract

The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL.md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries (SAGE), autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1\% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills.
Paper Structure (36 sections, 3 figures, 3 tables)

This paper contains 36 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Progressive disclosure architecture of Agent Skills. Information is loaded in three stages to minimize context window consumption while maintaining access to arbitrarily deep procedural knowledge. Token estimates are approximate per-skill averages; adapted from Zhang, Lazuka, and Murag zhang2025skills.
  • Figure 2: Architecture of a skill-equipped computer-use agent showing the interplay between the skill library, perception-grounding-action pipeline, MCP connectivity layer, and the operating system environment. The active skill (highlighted) is selected by the router and injected into the agent's context.
  • Figure 3: Proposed Skill Trust and Lifecycle Governance Framework. This integrative model---an original contribution of this survey---maps the four skill acquisition pathways identified in Section \ref{['sec:acquisition']} through a four-stage verification pipeline (G1--G4) to four trust tiers (T1--T4) that determine deployment permissions via the principle of least privilege. The lifecycle flow (bottom center) enables trust evolution through runtime monitoring. Empirical motivation (bottom panels) draws on the cross-cutting synthesis of architectural attack surfaces from Section \ref{['sec:architecture']} and security findings from three independent studies schmotz2025skillinjectionliu2026skillswildliu2026maliciousskills. No prior work has proposed a unified governance model spanning skill provenance, verification, and runtime permissions.