Table of Contents
Fetching ...

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

Jialong Zhou, Lichao Wang, Xiao Yang

TL;DR

GUARDIAN tackles safety in LLM-based multi-agent collaborations by modeling interactions as a discrete-time temporal attributed graph to explicitly capture hallucination and error propagation. It introduces an unsupervised encoder–decoder with dual reconstruction tasks and a graph abstraction driven by the Information Bottleneck, formalized as $L_GIB = I(X_t; Z_t) - beta I(Z_t; Y_t)$, along with an incremental training paradigm that updates representations over time. The approach achieves state-of-the-art accuracy across MMLU, MATH, FEVER, and Biographies benchmarks while reducing API calls and runtime, demonstrating strong anomaly detection with bounded information flow between agents. Its model-agnostic design and principled compression/regularization make it broadly applicable to diverse LLMs and multi-agent setups.

Abstract

The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization. The code is available at https://github.com/JialongZhou666/GUARDIAN

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

TL;DR

GUARDIAN tackles safety in LLM-based multi-agent collaborations by modeling interactions as a discrete-time temporal attributed graph to explicitly capture hallucination and error propagation. It introduces an unsupervised encoder–decoder with dual reconstruction tasks and a graph abstraction driven by the Information Bottleneck, formalized as , along with an incremental training paradigm that updates representations over time. The approach achieves state-of-the-art accuracy across MMLU, MATH, FEVER, and Biographies benchmarks while reducing API calls and runtime, demonstrating strong anomaly detection with bounded information flow between agents. Its model-agnostic design and principled compression/regularization make it broadly applicable to diverse LLMs and multi-agent setups.

Abstract

The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization. The code is available at https://github.com/JialongZhou666/GUARDIAN

Paper Structure

This paper contains 29 sections, 2 theorems, 9 equations, 16 figures, 14 tables.

Key Result

Theorem 4.2

Under the GIB mechanism, the LLM multi-agent system provides the following theorem: (1) Information Bottleneck: For any pair of collaborating agents, the information flow satisfies: where $\eta$ is the controllable compression rate. (2) Temporal Information Bottleneck: For the temporal evolution of agent representations, the mutual information between historical representations $\mathbf{Z}_{1:t-1

Figures (16)

  • Figure 1: Critical safety problems in LLM multi-agent collaboration: (1) hallucination amplification, where hallucinated information about a "computer science" major propagates across all agents; (2) agent-targeted error injection and propagation, where malicious agents inject false information (e.g., changing 2016 to 2015) that persists through subsequent rounds; and (3) communication-targeted error injection and propagation, where malicious agents intercept and corrupt information during inter-agent transmissions, disrupting collaboration.
  • Figure 2: (a) Examples of safety issues on multi-agent collaboration under A2A protocol: attacks on agents or communications in earlier rounds affect the responses of agents in subsequent rounds. (b) Agent-targeted and communication-targeted error injection and propagation visualization, highlighting high anomaly degrees. Dashed lines indicate communication edges. The visualization verifies the effectiveness of temporal attributed graph representation in capturing error dynamics.
  • Figure 3: Framework overview of GUARDIAN, showing a case study at timestep $t_2$. (1) Graph Preprocessing: The collaboration information from $t_0$ to $t_2$ is transformed into node attributes $\bm{x}_{t,i}$ and graph structures $\mathcal{E}$ using BERT and communication pattern abstraction. (2) Attributed Graph Encoder processes each time's graph to obtain node embeddings $\{\mathbf{Z}_t\}_{t=1}^T$. (3) Time Information Encoder aggregates multi-timestamp graph embeddings into the final timestamp $\mathbf{Z}_T$. (4) Structure and Attribute Reconstruction Decoder output reconstructed graph $\hat{\mathcal{E}}_T$ and node attributes $\hat{\mathbf{X}}_T$. (5) Anomaly scores $s_v$, calculated from the original and reconstructed graphs, identify and exclude the highest-scoring anomalous node from subsequent iterations.
  • Figure 3: Accuracy (%) comparison under different connection sparsity under hallucination amplification on MATH dataset with GPT-3.5-turbo. Bold values represent the highest accuracy.
  • Figure 4: A real case: co-existence of hallucination and agent-targeted errors.
  • ...and 11 more figures

Theorems & Definitions (6)

  • Definition 4.1: Temporal Graph Abstraction via Information Bottleneck
  • Remark 1
  • Theorem 4.2: LLM Collaboration Information Bounds, Proof in Appendix \ref{['appendix:proof']}
  • Remark 2
  • proof
  • Lemma A.1: Mutual Information Upper Bound in VIB