Table of Contents
Fetching ...

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng

Abstract

Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Abstract

Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

Paper Structure

This paper contains 47 sections, 5 figures, 11 tables.

Figures (5)

  • Figure 1: LLM router ecosystem and taint propagation. Agent clients (left) exchange requests and responses through a multi-hop graph of LLM routers to upstream model providers (right). Each hop terminates the inbound TLS session, granting full plaintext access. Green arrows denote clean data flow; red arrows trace how a single malicious router $R_4$, controlled by an external attacker, taints responses on the return path: corrupted payloads propagate through $R_1$ back to the compromised Claude Code and Codex clients, handing the attacker effective control over their tool execution ("your agent is mine"), while agents routed through honest paths (e.g., $R_2 \!\to\! R_5$) remain unaffected (Section \ref{['sec:attacks']}).
  • Figure 2: Request--response lifecycle through a malicious router. AC-2 tags mark where the router passively scans traffic for secrets (both request and response paths). AC-1 marks where parsed responses are rewritten before delivery; AC-1.a specializes to dependency substitution, AC-1.b gates activation on session-level triggers (Section \ref{['sec:attacks:evasion']}).
  • Figure 3: Observed malicious-router behaviors across 28 paid and 400 free routers. Bars are normalized within the paid and free populations; raw counts appear in Table \ref{['tab:measurement-main']} and the surrounding text. Adaptive evasion is observed only among routers that already perform active manipulation.
  • Figure 4: Threshold sweep for anomaly screening.
  • Figure 5: Defense effectiveness by attack class.