When Skills Lie: Hidden-Comment Injection in LLM Agents
Qianli Wang, Boyang Ma, Minghui Xu, Yue Zhang
TL;DR
The paper investigates hidden-comment prompt injection in Skills used by LLM agents, revealing that HTML-rendered Skill content can conceal instructions that steer model planning toward unsafe tool use. By evaluating DeepSeek-V3.2 and GLM-4.5-Air on a benign code-formatting task with clean versus malicious Skills, the authors demonstrate that hidden HTML comments can induce malicious tool intents, even when the user task remains harmless. A two-tier defense—a prompt-level guardrail treating Skills as untrusted and a hardened execution layer—prevents these dangerous outputs and surfaces the suspicious content. The work highlights security, design, and human oversight implications for Skill documentation, advocating for alignment between human-visible content and model input and for explicit safeguards to preserve least-privilege and accountability in IDE-style LLM assistants.
Abstract
LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.
