Table of Contents
Fetching ...

Security of AI Agents

Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, Hao Chen

TL;DR

The paper analyzes security vulnerabilities in AI agents that use LLMs and tool integration, framing confidentiality, integrity, and availability as core concerns. It provides a vulnerability taxonomy—covering sessions, model pollution/privacy leaks, and agent programs with local and remote risks—and offers defense designs with preliminary experiments. Defenses include session management, sandboxing (local and remote resource controls), and model protections via sessionless and session-aware approaches, including encryption-based techniques and prompt tuning. The work demonstrates practical security gaps and delivers actionable strategies to build safer, more reliable AI agents, with initial empirical support and open-data/code for replication.

Abstract

AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments. Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential vulnerabilities are not addressed by the frameworks used to build the agents, nor by research aimed at improving the agents. In this paper, we identify and describe these vulnerabilities in detail from a system security perspective, emphasizing their causes and severe effects. Furthermore, we introduce defense mechanisms corresponding to each vulnerability with design and experiments to evaluate their viability. Altogether, this paper contextualizes the security issues in the current development of AI agents and delineates methods to make AI agents safer and more reliable.

Security of AI Agents

TL;DR

The paper analyzes security vulnerabilities in AI agents that use LLMs and tool integration, framing confidentiality, integrity, and availability as core concerns. It provides a vulnerability taxonomy—covering sessions, model pollution/privacy leaks, and agent programs with local and remote risks—and offers defense designs with preliminary experiments. Defenses include session management, sandboxing (local and remote resource controls), and model protections via sessionless and session-aware approaches, including encryption-based techniques and prompt tuning. The work demonstrates practical security gaps and delivers actionable strategies to build safer, more reliable AI agents, with initial empirical support and open-data/code for replication.

Abstract

AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments. Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential vulnerabilities are not addressed by the frameworks used to build the agents, nor by research aimed at improving the agents. In this paper, we identify and describe these vulnerabilities in detail from a system security perspective, emphasizing their causes and severe effects. Furthermore, we introduce defense mechanisms corresponding to each vulnerability with design and experiments to evaluate their viability. Altogether, this paper contextualizes the security issues in the current development of AI agents and delineates methods to make AI agents safer and more reliable.
Paper Structure (25 sections, 1 equation, 10 figures, 3 tables)

This paper contains 25 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of LLM-based AI agent.
  • Figure 2: AI agent's potential vulnerability to model pollution.
  • Figure 3: AI agents cause privacy leakages.
  • Figure 4: An illustration of vulnerabilities of zero-shot action agents. In the figures, we use the term "World" to denote the host OS of the agent and external API resources.
  • Figure 5: An illustration of AI agent's effectful planning. In this case, even the users are interacting with the agent program in a non-harmful way, they might still cause security issues unintentionally. One thing to note is that agents are still vulnerable to attacks as in \ref{['fig:zeroshot_action']}.
  • ...and 5 more figures