Table of Contents
Fetching ...

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems

Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang

TL;DR

G-Safeguard introduces a topology-guided security lens for LLM-based MAS by constructing a multi-agent utterance graph and applying graph neural network-based anomaly detection to identify malicious agents. It then uses topology-aware edge pruning to prevent spread of adversarial information, achieving robust defense across prompt, memory, and tool attacks while transferring effectively across scales and backbones. The approach demonstrates significant ASR reductions, inductive generalization to larger MAS, and practical applicability in real-world, multi-role settings like CAMEL. This work advances MAS security by coupling topology-aware detection with lightweight, real-time remediation, enabling safer deployment of collaborative LLM-driven agents.

Abstract

Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees. The code is available at https://github.com/wslong20/G-safeguard.

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems

TL;DR

G-Safeguard introduces a topology-guided security lens for LLM-based MAS by constructing a multi-agent utterance graph and applying graph neural network-based anomaly detection to identify malicious agents. It then uses topology-aware edge pruning to prevent spread of adversarial information, achieving robust defense across prompt, memory, and tool attacks while transferring effectively across scales and backbones. The approach demonstrates significant ASR reductions, inductive generalization to larger MAS, and practical applicability in real-world, multi-role settings like CAMEL. This work advances MAS security by coupling topology-aware detection with lightweight, real-time remediation, enabling safer deployment of collaborative LLM-driven agents.

Abstract

Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees. The code is available at https://github.com/wslong20/G-safeguard.

Paper Structure

This paper contains 27 sections, 12 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: The paradigm comparison between single agent safeguard and multi-agent safeguard.
  • Figure 2: The designing workflow of our proposed G-Safeguard.
  • Figure 3: The overall performance of MAS on the CSQA (left) and MMLU (right) datasets after each turn of dialogue. We use majority voting as the strategy to select the final answer.
  • Figure 4: The recognition accuracy of G-Safeguard for MAS with different topological structures composed of various LLMs on PoisonRAG dataset.
  • Figure 5: The reply accuracy of agents on MAS with different number of nodes.
  • ...and 9 more figures