Table of Contents
Fetching ...

When Skills Lie: Hidden-Comment Injection in LLM Agents

Qianli Wang, Boyang Ma, Minghui Xu, Yue Zhang

TL;DR

The paper investigates hidden-comment prompt injection in Skills used by LLM agents, revealing that HTML-rendered Skill content can conceal instructions that steer model planning toward unsafe tool use. By evaluating DeepSeek-V3.2 and GLM-4.5-Air on a benign code-formatting task with clean versus malicious Skills, the authors demonstrate that hidden HTML comments can induce malicious tool intents, even when the user task remains harmless. A two-tier defense—a prompt-level guardrail treating Skills as untrusted and a hardened execution layer—prevents these dangerous outputs and surfaces the suspicious content. The work highlights security, design, and human oversight implications for Skill documentation, advocating for alignment between human-visible content and model input and for explicit safeguards to preserve least-privilege and accountability in IDE-style LLM assistants.

Abstract

LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.

When Skills Lie: Hidden-Comment Injection in LLM Agents

TL;DR

The paper investigates hidden-comment prompt injection in Skills used by LLM agents, revealing that HTML-rendered Skill content can conceal instructions that steer model planning toward unsafe tool use. By evaluating DeepSeek-V3.2 and GLM-4.5-Air on a benign code-formatting task with clean versus malicious Skills, the authors demonstrate that hidden HTML comments can induce malicious tool intents, even when the user task remains harmless. A two-tier defense—a prompt-level guardrail treating Skills as untrusted and a hardened execution layer—prevents these dangerous outputs and surfaces the suspicious content. The work highlights security, design, and human oversight implications for Skill documentation, advocating for alignment between human-visible content and model input and for explicit safeguards to preserve least-privilege and accountability in IDE-style LLM assistants.

Abstract

LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.
Paper Structure (19 sections, 1 figure, 1 table)

This paper contains 19 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Architecture of a Skill-conditioned LLM agent and its attack surface.