Table of Contents
Fetching ...

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Leo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, Alexandre Drouin, Krishnamurthy Dvijotham

TL;DR

DoomArena addresses the problem of robust security testing for AI agents deployed in realistic, diverse environments by providing a modular, plug-in framework that couples threat modeling with environment-specific attack gateways. It enables deployment-context-aware, multi-attack evaluations across benchmarks like $τ$-Bench, $BrowserGym$, and OSWorld, separating attack development from environment details to test generalizable vulnerabilities. The key contributions include a formal AttackConfig threat model, environment-wide attack gateways, and demonstrative case studies showing varied vulnerability profiles, constructive interference between attacks, and defenses such as guardrails and LLM-based judges. The work shows practical significance by revealing that no single agent dominates across threat models, emphasizing the need for adaptive, context-aware security testing to guide defense design in frontier AI agents. DoomArena is open-source and designed to facilitate ongoing research into the security of agentic AI systems in realistic deployment contexts.

Abstract

We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a plug-in framework and integrates easily into realistic agentic frameworks like BrowserGym (for web agents) and $τ$-bench (for tool calling agents); 2) It is configurable and allows for detailed threat modeling, allowing configuration of specific components of the agentic framework being attackable, and specifying targets for the attacker; and 3) It is modular and decouples the development of attacks from details of the environment in which the agent is deployed, allowing for the same attacks to be applied across multiple environments. We illustrate several advantages of our framework, including the ability to adapt to new threat models and environments easily, the ability to easily combine several previously published attacks to enable comprehensive and fine-grained security testing, and the ability to analyze trade-offs between various vulnerabilities and performance. We apply DoomArena to state-of-the-art (SOTA) web and tool-calling agents and find a number of surprising results: 1) SOTA agents have varying levels of vulnerability to different threat models (malicious user vs malicious environment), and there is no Pareto dominant agent across all threat models; 2) When multiple attacks are applied to an agent, they often combine constructively; 3) Guardrail model-based defenses seem to fail, while defenses based on powerful SOTA LLMs work better. DoomArena is available at https://github.com/ServiceNow/DoomArena.

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

TL;DR

DoomArena addresses the problem of robust security testing for AI agents deployed in realistic, diverse environments by providing a modular, plug-in framework that couples threat modeling with environment-specific attack gateways. It enables deployment-context-aware, multi-attack evaluations across benchmarks like -Bench, , and OSWorld, separating attack development from environment details to test generalizable vulnerabilities. The key contributions include a formal AttackConfig threat model, environment-wide attack gateways, and demonstrative case studies showing varied vulnerability profiles, constructive interference between attacks, and defenses such as guardrails and LLM-based judges. The work shows practical significance by revealing that no single agent dominates across threat models, emphasizing the need for adaptive, context-aware security testing to guide defense design in frontier AI agents. DoomArena is open-source and designed to facilitate ongoing research into the security of agentic AI systems in realistic deployment contexts.

Abstract

We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a plug-in framework and integrates easily into realistic agentic frameworks like BrowserGym (for web agents) and -bench (for tool calling agents); 2) It is configurable and allows for detailed threat modeling, allowing configuration of specific components of the agentic framework being attackable, and specifying targets for the attacker; and 3) It is modular and decouples the development of attacks from details of the environment in which the agent is deployed, allowing for the same attacks to be applied across multiple environments. We illustrate several advantages of our framework, including the ability to adapt to new threat models and environments easily, the ability to easily combine several previously published attacks to enable comprehensive and fine-grained security testing, and the ability to analyze trade-offs between various vulnerabilities and performance. We apply DoomArena to state-of-the-art (SOTA) web and tool-calling agents and find a number of surprising results: 1) SOTA agents have varying levels of vulnerability to different threat models (malicious user vs malicious environment), and there is no Pareto dominant agent across all threat models; 2) When multiple attacks are applied to an agent, they often combine constructively; 3) Guardrail model-based defenses seem to fail, while defenses based on powerful SOTA LLMs work better. DoomArena is available at https://github.com/ServiceNow/DoomArena.

Paper Structure

This paper contains 41 sections, 22 figures, 8 tables.

Figures (22)

  • Figure 1: (a) Abstract architecture of DoomArena. An agent operates in an environment, performing tasks for a user, creating a user-agent-environment loop. A detailed threat modeling exercise tailored to the AI agent’s deployment context results in a threat model encoded as an attack config. This config specifies malicious components, applicable attacks, and attack success criteria. The attack gateway pipes attacks to the right components, enabling realistic attack simulations and agent evaluation under adversarial conditions. (b) Realizations of the abstract framework. We build AttackGateway-s as wrappers around an original agentic environment ($\tau$-Bench, BrowserGym, OSWorld, etc.). The AttackGateway injects malicious content into the user-agent-environment loop as the AI agent interacts with it. The figure shows that for one such gateway built around $\tau$-bench, we can allow for threat models where a database that the agent interacts with is malicious, or the user interacting with the agent is malicious. DoomArena allows any element of the loop (tools, databases, web pages, users, chatbots) to be attacked as long as the gateway supports it (see Section \ref{['sec:adaptive_testing']} for an example of the simplicity of adding new threat models to a gateway). The threat model is specified by the AttackConfig, which specifies the AttackableComponent, the AttackChoice (drawn from a library of implemented attacks), and the SuccessFilter, which evaluates whether the attack succeeded.
  • Figure 2: Exploring different threat models and attacks. With the attack gateway implemented, threat models and attacks can be swapped via AttackConfig. In the $\tau$-bench airline environment, when going from a malicious user threat model to a malicious catalog threat model, the attack success rate increases from 2.7% to 39.1% (excerpt from detailed results in Table \ref{['tab:attack_metrics_tau_bench_main']}).
  • Figure 3: Evolution of vulnerabilities in AI agents over the past few years. This is compiled from various sources and generated with https://claude.ai/ with the authors double-checking the sources used. The extrapolation to 2025 is the output of linear regression on past data. For sources, refer to Appendix \ref{['app:trend_sources']}
  • Figure 4: Adding a New Threat Model to BrowserGymAttackGateway: poisoned product reviews. The gateway is responsible for calling attack.get_next_attack() to generate malicious content, and injecting it into the environment, in this case by patching the step() method of the environment.
  • Figure 5: Simple Attack Gateway for OSWorld. The gateway can be used in place of DesktopEnv and supports pop-up injection threats, which target agents that use screenshots to complete the desired task.
  • ...and 17 more figures