Table of Contents
Fetching ...

Security Considerations for Multi-agent Systems

Tam Nguyen, Moses Ndebugre, Dheeraj Arremsetty

TL;DR

This study systematically characterizes the threat landscape of MAS and quantitatively evaluates 16 security frameworks for AI against it, providing the first empirical cross-framework comparison and evidence-based guidance for framework selection.

Abstract

Multi-agent artificial intelligence systems or MAS are systems of autonomous agents that exercise delegated tool authority, share persistent memory, and coordinate via inter-agent communication. MAS introduces qualitatively distinct security vulnerabilities from those documented for singular AI models. Existing security and governance frameworks were not designed for these emerging attack surfaces. This study systematically characterizes the threat landscape of MAS and quantitatively evaluates 16 security frameworks for AI against it. A four-phase methodology is proposed: constructing a deep technical knowledge base of production multi-agent architectures; conducting generative AI-assisted threat modeling scoped to MAS cybersecurity risks and validated by domain experts; structuring survey plans at individual-threat granularity; and scoring each framework on a three-point scale against the cybersecurity risks. The risks were organized into 193 distinct main threat items across nine risk categories. The expected minimal average score is 2. No reviewed framework achieves majority coverage of any single category. Non-Determinism (mean score 1.231 across all 16 frameworks) and Data Leakage (1.340) are the most under-addressed domains. The OWASP Agentic Security Initiative leads overall at 65.3\% coverage and in the design phase; the CDAO Generative AI Responsible AI Toolkit leads in development and operational coverage. These results provide the first empirical cross-framework comparison for MAS security and offer evidence-based guidance for framework selection.

Security Considerations for Multi-agent Systems

TL;DR

This study systematically characterizes the threat landscape of MAS and quantitatively evaluates 16 security frameworks for AI against it, providing the first empirical cross-framework comparison and evidence-based guidance for framework selection.

Abstract

Multi-agent artificial intelligence systems or MAS are systems of autonomous agents that exercise delegated tool authority, share persistent memory, and coordinate via inter-agent communication. MAS introduces qualitatively distinct security vulnerabilities from those documented for singular AI models. Existing security and governance frameworks were not designed for these emerging attack surfaces. This study systematically characterizes the threat landscape of MAS and quantitatively evaluates 16 security frameworks for AI against it. A four-phase methodology is proposed: constructing a deep technical knowledge base of production multi-agent architectures; conducting generative AI-assisted threat modeling scoped to MAS cybersecurity risks and validated by domain experts; structuring survey plans at individual-threat granularity; and scoring each framework on a three-point scale against the cybersecurity risks. The risks were organized into 193 distinct main threat items across nine risk categories. The expected minimal average score is 2. No reviewed framework achieves majority coverage of any single category. Non-Determinism (mean score 1.231 across all 16 frameworks) and Data Leakage (1.340) are the most under-addressed domains. The OWASP Agentic Security Initiative leads overall at 65.3\% coverage and in the design phase; the CDAO Generative AI Responsible AI Toolkit leads in development and operational coverage. These results provide the first empirical cross-framework comparison for MAS security and offer evidence-based guidance for framework selection.
Paper Structure (236 sections, 3 figures, 5 tables)

This paper contains 236 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Coverage of 193 agentic AI threat items per framework, stacked by coverage tier. Frameworks sorted by total coverage (score $\geq$ 2) in descending order. OWASP ASI leads at 65.3%; DIU RAI provides the narrowest coverage at 6.2%.
  • Figure 2: Mean coverage score (averaged across all 16 frameworks) per threat category, sorted weakest to strongest. No category reaches the 1.6 threshold where a majority of frameworks provide meaningful coverage.
  • Figure 3: Mean coverage of the top four frameworks across three lifecycle phases. Design: RATC, RIP, RWA (56 items); Development: RMP, RTM (28 items); Operation: RIDC, RDL, RTE, RND (109 items).