Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
TL;DR
The paper investigates the security of Agent Skills used to extend LLMs with new knowledge via markdown-based SKILL.md files. It demonstrates that these skills enable trivially simple prompt injections, including hiding malicious instructions and exfiltrating sensitive data, and shows how system guardrails can be bypassed in realistic workflows such as Claude Code. Through experiments on Claude Code and Claude Web Interface, the authors reveal practical attack paths that rely on user oversight and long, unreviewed skill files, arguing that frontier LLMs remain vulnerable despite scaling. The work highlights the need for stronger model-level defenses and complementary guardrails, and provides a public codebase to reproduce the attacks, urging safer use of third-party Skill ecosystems.
Abstract
Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model capabilities, frontier LLMs remain vulnerable to very simple prompt injections in realistic scenarios. Our code is available at https://github.com/aisa-group/promptinject-agent-skills.
