Table of Contents
Fetching ...

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai, Parth Shah, Harshil Patel

TL;DR

AI agents in production face security challenges from adaptive adversaries, while manual red-teaming and static benchmarks struggle to scale or stay relevant. NAAMSE redefines security evaluation as a feedback-driven optimization where a single autonomous agent mutates prompts, explores a hierarchical corpus, and uses a fitness signal derived from responses to uncover vulnerabilities. Empirical results show that combining exploration with targeted mutation yields higher-severity findings than one-shot methods, with ablations validating the synergy and independent LLMs confirming jailbreaks. The framework offers a scalable, realistic assessment of agent robustness and is released as open-source for broader adoption.

Abstract

AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem. Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring. By using model responses as a fitness signal, the framework iteratively compounds effective attack strategies while simultaneously ensuring "benign-use correctness", preventing the degenerate security of blanket refusal. Our experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods, with controlled ablations revealing that the synergy between exploration and targeted mutation uncovers high-severity failure modes. We show that this adaptive approach provides a more realistic and scalable assessment of agent robustness in the face of evolving threats. The code for NAAMSE is open source and available at https://github.com/HASHIRU-AI/NAAMSE.

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

TL;DR

AI agents in production face security challenges from adaptive adversaries, while manual red-teaming and static benchmarks struggle to scale or stay relevant. NAAMSE redefines security evaluation as a feedback-driven optimization where a single autonomous agent mutates prompts, explores a hierarchical corpus, and uses a fitness signal derived from responses to uncover vulnerabilities. Empirical results show that combining exploration with targeted mutation yields higher-severity findings than one-shot methods, with ablations validating the synergy and independent LLMs confirming jailbreaks. The framework offers a scalable, realistic assessment of agent robustness and is released as open-source for broader adoption.

Abstract

AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem. Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring. By using model responses as a fitness signal, the framework iteratively compounds effective attack strategies while simultaneously ensuring "benign-use correctness", preventing the degenerate security of blanket refusal. Our experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods, with controlled ablations revealing that the synergy between exploration and targeted mutation uncovers high-severity failure modes. We show that this adaptive approach provides a more realistic and scalable assessment of agent robustness in the face of evolving threats. The code for NAAMSE is open source and available at https://github.com/HASHIRU-AI/NAAMSE.
Paper Structure (15 sections, 4 equations, 3 tables)