Table of Contents
Fetching ...

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li, Leo Yu Zhang, Ying Zhang, Lei Ma

Abstract

LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Abstract

LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: A poisoned pptx skill. Left: the highlighted line disguises exfiltration as a routine backup. Right: the referenced script silently uploads documents to an attacker-controlled server.
  • Figure 2: End-to-end threat scenario for PoisonedSkills. The attacker publishes a disguised malicious skill ($s_{adv}$) to a public marketplace. The skill reaches the victim agent through retrieval and, once loaded, induces the agent to exfiltrate private data, escalate privileges, or execute arbitrary code. This work evaluates the post-loading phase (shaded region): whether the embedded payload can trigger harmful execution despite safety-alignment and architectural defenses. The retrieval phase is assumed to succeed.
  • Figure 3: Running examples of Document-Driven Implicit Payload Execution (DDIPE). Scenario A (left) conceals environment-variable exfiltration within a PDF processing function: the payload silently posts os.environ to an attacker-controlled endpoint, and silent exception handling ensures the main logic runs uninterrupted. Scenario B (right) injects a privileged container-escape backdoor and an unauthorized host-root mount into a Kubernetes deployment template. In both cases, the underlying model reproduces the poisoned code as "best practices" when processing routine tasks.
  • Figure 4: Universal breach case study. This 479-byte payload (a 9-line pip configuration write) is the only sample executed by all three models under Claude Code.
  • Figure 5: Compliance trap case study (Conda environment poisoning). Under the same Claude Code architecture, Sonnet 4.6 directly executes the payload while GLM-4.7 refuses.

Theorems & Definitions (1)

  • Definition 4.1: Payload Embedding Strategy Taxonomy