Table of Contents
Fetching ...

A Safety and Security Framework for Real-World Agentic Systems

Shaona Ghosh, Barnaby Simkin, Kyriacos Shiarlis, Soumili Nandi, Dan Zhao, Matthew Fiedler, Julia Bazinska, Nikki Pope, Roopa Prabhu, Daniel Rohrer, Michael Demoret, Bartley Richardson

TL;DR

The paper addresses the safety and security of real-world agentic systems by framing them as dynamic, interconnected workflows whose risks emerge from interactions among models, tools, memory, and data.It proposes a compositional risk framework with a multi-layer taxonomy, an embedded safety-security architecture, and sandboxed, AI-driven red-teaming (ARP) to discover and mitigate novel risks in enterprise-scale deployments.A detailed case study on NVIDIA's AI-Q Research Assistant (AIRA) demonstrates how risk snapshots, injection/evaluation probes, and defender agents reveal and attenuate attack propagation, while a large dataset of traces supports reproducible safety research.The work provides practical defenses, metrics, and governance primitives for deployment-ready agentic systems, highlighting the need for context-aware, adaptive defenses alongside continuous risk discovery.Overall, it lays a principled, scalable foundation for safer agentic AI in industry, coupling systematic risk taxonomy with automated, context-sensitive risk discovery and remediation.

Abstract

This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of individual models but also emergent properties arising from the dynamic interactions among models, orchestrators, tools, and data within their operating environments. We propose a new way of identification of novel agentic risks through the lens of user safety. Although, for traditional LLMs and agentic models in isolation, safety and security has a clear separation, through the lens of safety in agentic systems, they appear to be connected. Building on this foundation, we define an operational agentic risk taxonomy that unifies traditional safety and security concerns with novel, uniquely agentic risks, including tool misuse, cascading action chains, and unintended control amplification among others. At the core of our approach is a dynamic agentic safety and security framework that operationalizes contextual agentic risk management by using auxiliary AI models and agents, with human oversight, to assist in contextual risk discovery, evaluation, and mitigation. We further address one of the most challenging aspects of safety and security of agentic systems: risk discovery through sandboxed, AI-driven red teaming. We demonstrate the framework effectiveness through a detailed case study of NVIDIA flagship agentic research assistant, AI-Q Research Assistant, showcasing practical, end-to-end safety and security evaluations in complex, enterprise-grade agentic workflows. This risk discovery phase finds novel agentic risks that are then contextually mitigated. We also release the dataset from our case study, containing traces of over 10,000 realistic attack and defense executions of the agentic workflow to help advance research in agentic safety.

A Safety and Security Framework for Real-World Agentic Systems

TL;DR

The paper addresses the safety and security of real-world agentic systems by framing them as dynamic, interconnected workflows whose risks emerge from interactions among models, tools, memory, and data.It proposes a compositional risk framework with a multi-layer taxonomy, an embedded safety-security architecture, and sandboxed, AI-driven red-teaming (ARP) to discover and mitigate novel risks in enterprise-scale deployments.A detailed case study on NVIDIA's AI-Q Research Assistant (AIRA) demonstrates how risk snapshots, injection/evaluation probes, and defender agents reveal and attenuate attack propagation, while a large dataset of traces supports reproducible safety research.The work provides practical defenses, metrics, and governance primitives for deployment-ready agentic systems, highlighting the need for context-aware, adaptive defenses alongside continuous risk discovery.Overall, it lays a principled, scalable foundation for safer agentic AI in industry, coupling systematic risk taxonomy with automated, context-sensitive risk discovery and remediation.

Abstract

This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of individual models but also emergent properties arising from the dynamic interactions among models, orchestrators, tools, and data within their operating environments. We propose a new way of identification of novel agentic risks through the lens of user safety. Although, for traditional LLMs and agentic models in isolation, safety and security has a clear separation, through the lens of safety in agentic systems, they appear to be connected. Building on this foundation, we define an operational agentic risk taxonomy that unifies traditional safety and security concerns with novel, uniquely agentic risks, including tool misuse, cascading action chains, and unintended control amplification among others. At the core of our approach is a dynamic agentic safety and security framework that operationalizes contextual agentic risk management by using auxiliary AI models and agents, with human oversight, to assist in contextual risk discovery, evaluation, and mitigation. We further address one of the most challenging aspects of safety and security of agentic systems: risk discovery through sandboxed, AI-driven red teaming. We demonstrate the framework effectiveness through a detailed case study of NVIDIA flagship agentic research assistant, AI-Q Research Assistant, showcasing practical, end-to-end safety and security evaluations in complex, enterprise-grade agentic workflows. This risk discovery phase finds novel agentic risks that are then contextually mitigated. We also release the dataset from our case study, containing traces of over 10,000 realistic attack and defense executions of the agentic workflow to help advance research in agentic safety.

Paper Structure

This paper contains 66 sections, 15 figures, 9 tables.

Figures (15)

  • Figure 1: An Illustration of the Agentic System Safety and Security Framework
  • Figure 2: Architecture Overview of NVIDIA AI-Q Research Assistant Agent
  • Figure 3: Injection Probe (left) and Evaluation Probe (right) Functionality on the AI-Q Research Assistant Agent (AIRA).
  • Figure 4: Probe-instrumented AI-Q Research Assistant (AIRA). Injection probes (red) are placed at the user input and at the output of search related tools, i.e., retrieved sources from search are corrupted with attacks. Evaluation probes (green) are placed in all downstream summarization nodes allowing us to track an attack's evolution through the system.
  • Figure 5: Mean Risk Scores for Direct and Indirect Attacks.
  • ...and 10 more figures