Table of Contents
Fetching ...

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, Prasant Mohapatra

TL;DR

Agentic AI, built from LLMs with autonomy, memory, and tool use, introduces security risks that exceed traditional AI safety and software security. The paper surveys this landscape by proposing a structured threat taxonomy, reviewing defenses from design to governance, and evaluating benchmarks for safety-critical agentics, while outlining open challenges such as long-horizon security and adaptive attacks. It contributes a holistic view that links attack surfaces (prompt injection, autonomous tool abuse, multi-agent protocols) with defense strategies (prompt-resistant designs, policy enforcement, sandboxing, monitoring, and standards) and advocates for process-aware benchmarking and robust evaluation. The findings underscore the urgency of secure-by-design agentic AI and provide a roadmap for researchers and practitioners to develop resilient, auditable, and trustworthy autonomous systems.

Abstract

Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety and conventional software security. This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives. We synthesize current research and highlight open challenges, aiming to support the development of secure-by-design agent systems.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

TL;DR

Agentic AI, built from LLMs with autonomy, memory, and tool use, introduces security risks that exceed traditional AI safety and software security. The paper surveys this landscape by proposing a structured threat taxonomy, reviewing defenses from design to governance, and evaluating benchmarks for safety-critical agentics, while outlining open challenges such as long-horizon security and adaptive attacks. It contributes a holistic view that links attack surfaces (prompt injection, autonomous tool abuse, multi-agent protocols) with defense strategies (prompt-resistant designs, policy enforcement, sandboxing, monitoring, and standards) and advocates for process-aware benchmarking and robust evaluation. The findings underscore the urgency of secure-by-design agentic AI and provide a roadmap for researchers and practitioners to develop resilient, auditable, and trustworthy autonomous systems.

Abstract

Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety and conventional software security. This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives. We synthesize current research and highlight open challenges, aiming to support the development of secure-by-design agent systems.

Paper Structure

This paper contains 71 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Taxonomy of Agentic AI Security Threats.
  • Figure 2: Examples showcasing (a) direct and (b) indirect prompt injection. In the former, the agent is directly instructed by the adversary to reveal confidential information whereas in the latter the attacker has exploited the agent's reliance on external information sources to have it download malware from their altered website.
  • Figure 3: An example showcasing unintentional prompt injection.
  • Figure 4: Visualizing different prompt injection attacks based on modality: (a) image-based, (b) text-based code injection, and a (c) hybrid attack.
  • Figure 5: Visualizing different protocol-level attacks for multi-agent systems: (a) MCP-induced, (b) A2A-induced.
  • ...and 3 more figures