Table of Contents
Fetching ...

Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks

Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, Micah Goldblum

TL;DR

This paper reveals that commercial LLM-powered agents harbor security and privacy vulnerabilities that are not present in isolated models. It presents a taxonomy of agent-specific attacks and demonstrates a simple, realistic attack pipeline that can compromise web and scientific-discovery agents without ML expertise. The findings show high success rates for data exfiltration, malware downloads, phishing, and unsafe synthesis outputs, underscoring immediate practical risks. The authors propose defense directions, including access-control, authentication, and context-aware safeguards, and call for broader red-teaming and formal verification to mitigate these agentic threats.

Abstract

A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs). These attacks may extract private information or coerce the model into producing harmful outputs. In real-world deployments, LLMs are often part of a larger agentic pipeline including memory systems, retrieval, web access, and API calling. Such additional components introduce vulnerabilities that make these LLM-powered agents much easier to attack than isolated LLMs, yet relatively little work focuses on the security of LLM agents. In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We first provide a taxonomy of attacks categorized by threat actors, objectives, entry points, attacker observability, attack strategies, and inherent vulnerabilities of agent pipelines. We then conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities. Notably, our attacks are trivial to implement and require no understanding of machine learning.

Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks

TL;DR

This paper reveals that commercial LLM-powered agents harbor security and privacy vulnerabilities that are not present in isolated models. It presents a taxonomy of agent-specific attacks and demonstrates a simple, realistic attack pipeline that can compromise web and scientific-discovery agents without ML expertise. The findings show high success rates for data exfiltration, malware downloads, phishing, and unsafe synthesis outputs, underscoring immediate practical risks. The authors propose defense directions, including access-control, authentication, and context-aware safeguards, and call for broader red-teaming and formal verification to mitigate these agentic threats.

Abstract

A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs). These attacks may extract private information or coerce the model into producing harmful outputs. In real-world deployments, LLMs are often part of a larger agentic pipeline including memory systems, retrieval, web access, and API calling. Such additional components introduce vulnerabilities that make these LLM-powered agents much easier to attack than isolated LLMs, yet relatively little work focuses on the security of LLM agents. In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We first provide a taxonomy of attacks categorized by threat actors, objectives, entry points, attacker observability, attack strategies, and inherent vulnerabilities of agent pipelines. We then conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities. Notably, our attacks are trivial to implement and require no understanding of machine learning.

Paper Structure

This paper contains 19 sections, 5 figures.

Figures (5)

  • Figure 1: A user submits a mundane shopping request to their web agent. Left: The web agent begins by searching Google and finds a seemingly relevant Reddit page. Center: Upon reaching a trusted platform (e.g., Reddit), the agent comes across a malicious post by an attacker and is redirected to a malicious site. Right: On the malicious site, a jailbreak prompt coerces the agent into divulging private information or performing harmful actions.
  • Figure 2: Web agent attack pipeline in which a user is redirected from a trustworthy platform to a site containing malicious instructions.
  • Figure 3: An agent is instructed to conduct a phishing attack on a malicious website. The agent is redirected to an attacker's website from a trusted platform like Reddit, and the attacker then instructs the agent to launch a phishing attack, detailing the exact text of the phishing email. This phishing email will come from the user's own email address and will therefore appear legitimate.
  • Figure 4: An attack on a scientific agent in which a user is tricked into retrieving and executing instructions for synthesizing a toxin.
  • Figure 5: An example attack on ChemCrow. The agent is asked for a synthesis procedure for a pharmaceutical compound, Xadago, but is instead manipulated into returning the recipe for nerve gas.