Security of AI Agents
Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, Hao Chen
TL;DR
The paper analyzes security vulnerabilities in AI agents that use LLMs and tool integration, framing confidentiality, integrity, and availability as core concerns. It provides a vulnerability taxonomy—covering sessions, model pollution/privacy leaks, and agent programs with local and remote risks—and offers defense designs with preliminary experiments. Defenses include session management, sandboxing (local and remote resource controls), and model protections via sessionless and session-aware approaches, including encryption-based techniques and prompt tuning. The work demonstrates practical security gaps and delivers actionable strategies to build safer, more reliable AI agents, with initial empirical support and open-data/code for replication.
Abstract
AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments. Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential vulnerabilities are not addressed by the frameworks used to build the agents, nor by research aimed at improving the agents. In this paper, we identify and describe these vulnerabilities in detail from a system security perspective, emphasizing their causes and severe effects. Furthermore, we introduce defense mechanisms corresponding to each vulnerability with design and experiments to evaluate their viability. Altogether, this paper contextualizes the security issues in the current development of AI agents and delineates methods to make AI agents safer and more reliable.
