G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems
Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang
TL;DR
G-Safeguard introduces a topology-guided security lens for LLM-based MAS by constructing a multi-agent utterance graph and applying graph neural network-based anomaly detection to identify malicious agents. It then uses topology-aware edge pruning to prevent spread of adversarial information, achieving robust defense across prompt, memory, and tool attacks while transferring effectively across scales and backbones. The approach demonstrates significant ASR reductions, inductive generalization to larger MAS, and practical applicability in real-world, multi-role settings like CAMEL. This work advances MAS security by coupling topology-aware detection with lightweight, real-time remediation, enabling safer deployment of collaborative LLM-driven agents.
Abstract
Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees. The code is available at https://github.com/wslong20/G-safeguard.
