Table of Contents
Fetching ...

WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference

Zixun Xiong, Gaoyi Wu, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang

Abstract

Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. % Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world threat of such attacks. % To bridge this realism gap, we propose \textit{WebWeaver}, an attack framework that infers the complete LLM-MAS topology by compromising only a single arbitrary agent instead of the administrative agent. % Unlike prior approaches, WebWeaver relies solely on agent contexts rather than agent IDs, enabling significantly stealthier inference. % WebWeaver further introduces a new covert jailbreak-based mechanism and a novel fully jailbreak-free diffusion design to handle cases where jailbreaks fail. % Additionally, we address a key challenge in diffusion-based inference by proposing a masking strategy that preserves known topology during diffusion, with theoretical guarantees of correctness. % Extensive experiments show that WebWeaver substantially outperforms state-of-the-art (SOTA) baselines, achieving about 60\% higher inference accuracy under active defenses with negligible overhead.

WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference

Abstract

Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. % Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world threat of such attacks. % To bridge this realism gap, we propose \textit{WebWeaver}, an attack framework that infers the complete LLM-MAS topology by compromising only a single arbitrary agent instead of the administrative agent. % Unlike prior approaches, WebWeaver relies solely on agent contexts rather than agent IDs, enabling significantly stealthier inference. % WebWeaver further introduces a new covert jailbreak-based mechanism and a novel fully jailbreak-free diffusion design to handle cases where jailbreaks fail. % Additionally, we address a key challenge in diffusion-based inference by proposing a masking strategy that preserves known topology during diffusion, with theoretical guarantees of correctness. % Extensive experiments show that WebWeaver substantially outperforms state-of-the-art (SOTA) baselines, achieving about 60\% higher inference accuracy under active defenses with negligible overhead.
Paper Structure (48 sections, 8 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 48 sections, 8 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: Studies yu2024netsafeyang2025topological show that communication topology in LLMMAS significantly affects security and utility performance, with no universally optimal topology, making it a critical form of intellectual property.
  • Figure 2: The pipeline of WebWeaver: Step-1: Inter-agent dialogues are collected offline under known topologies and reused for both training and inference. Step-2: The collected dialogues are used to train a sender predictor that infers the sender identity from a received dialogue. Step-3: The attacker compromises a selected agent $C$ and retrieves all dialogues received by this agent during interaction. Step-4: The trained sender predictor is applied to the retrieved dialogues to infer the set of agents directly connected to the selected agent. Step-5: If the initial or optimized jailbreak succeeds (denoted as Y), the attacker uses it to induce connected agents to request additional context and iteratively repeats dialogue retrieval and sender prediction to expand the inferred graph until no new agents are uncovered; if the optimized jailbreak still fails (denoted as N), the attacker performs topology completion using a masked diffusion model (i.e., DDPM) trained on the collected dialogues and conditioned on the partially known topology to infer the complete interaction structure.
  • Figure 3: Robustness of different attacks against a keyword-based defense that rejects any request containing information intended to reveal adjacent agent IDs. "Ours (w/o)" is the jailbreak-free module of WebWeaver, and "Ours" is the jailbreak-based WebWeaver.
  • Figure 4: Scalability of the jailbreak-based module with different numbers of agents.
  • Figure 5: Inferred topology examples. All examples are selected randomly from five types of topologies. The red node stands for the compromised agent, and the gray nodes stand for the benign agents.