Table of Contents
Fetching ...

Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks

Samira Abedini, Sina Mavali, Lea Schönherr, Martin Pawelczyk, Rebekka Burkholz

Abstract

Large Language Model (LLM)-based Multi-Agent Systems (MASs) are increasingly deployed for agentic tasks, such as web automation, itinerary planning, and collaborative problem solving. Yet, their interactive nature introduces new security risks: malicious or compromised agents can exploit communication channels to propagate misinformation and manipulate collective outcomes. In this paper, we study how such manipulation can arise and spread by borrowing the Friedkin-Johnsen opinion formation model from social sciences to propose a general theoretical framework to study LLM-MAS. Remarkably, this model closely captures LLM-MAS behavior, as we verify in extensive experiments across different network topologies and attack and defense scenarios. Theoretically and empirically, we find that a single highly stubborn and persuasive agent can take over MAS dynamics, underscoring the systems' high susceptibility to attacks by triggering a persuasion cascade that reshapes collective opinion. Our theoretical analysis reveals three mechanisms to increase system security: a) increasing the number of benign agents, b) increasing the innate stubbornness or peer-resistance of agents, or c) reducing trust in potential adversaries. Because scaling is computationally expensive and high stubbornness degrades the network's ability to reach consensus, we propose a new mechanism to mitigate threats by a trust-adaptive defense that dynamically adjusts inter-agent trust to limit adversarial influence while maintaining cooperative performance. Extensive experiments confirm that this mechanism effectively defends against manipulation.

Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks

Abstract

Large Language Model (LLM)-based Multi-Agent Systems (MASs) are increasingly deployed for agentic tasks, such as web automation, itinerary planning, and collaborative problem solving. Yet, their interactive nature introduces new security risks: malicious or compromised agents can exploit communication channels to propagate misinformation and manipulate collective outcomes. In this paper, we study how such manipulation can arise and spread by borrowing the Friedkin-Johnsen opinion formation model from social sciences to propose a general theoretical framework to study LLM-MAS. Remarkably, this model closely captures LLM-MAS behavior, as we verify in extensive experiments across different network topologies and attack and defense scenarios. Theoretically and empirically, we find that a single highly stubborn and persuasive agent can take over MAS dynamics, underscoring the systems' high susceptibility to attacks by triggering a persuasion cascade that reshapes collective opinion. Our theoretical analysis reveals three mechanisms to increase system security: a) increasing the number of benign agents, b) increasing the innate stubbornness or peer-resistance of agents, or c) reducing trust in potential adversaries. Because scaling is computationally expensive and high stubbornness degrades the network's ability to reach consensus, we propose a new mechanism to mitigate threats by a trust-adaptive defense that dynamically adjusts inter-agent trust to limit adversarial influence while maintaining cooperative performance. Extensive experiments confirm that this mechanism effectively defends against manipulation.
Paper Structure (46 sections, 20 theorems, 70 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 46 sections, 20 theorems, 70 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Proposition 4.1

Let $\Gamma = 0$ and $M$ be defined as in Equation eq:matrix_dyn. Furthermore, let $M$ be irreducible and aperiodic, then there exists a unique stationary distribution. $M$ converges to a consensus so that $b_i(\infty) = b_j(\infty)$ for all $i,j \in V$. If $M$ is additionally doubly stochastic, the

Figures (10)

  • Figure 1: Left: We leverage the Friedkin-Johnsen (FJ) opinion dynamics framework to model LLM multi-agent belief propagation. Middle: Using FJ, we analyze how vulnerable the final opinion in LLM multi-agent systems is to being hijacked by a single adversary. Right. Using our theoretical insights, we design a trust-adaptive defense mechanism.
  • Figure 2: Network topologies and different attacker accessibility. Red nodes denote attackers.
  • Figure 3: Hijacked Consensus Regions for Different Network Topologies. We visualize the conditions for which the consensus formation is hijacked by the attacker and when the consensus is safe from the attacker. The conditions for finite $N$ are implied by Corollaries \ref{['coro:cond_takeover_leaf']} -- \ref{['coro:cond_takeover_hub']}. The size of the hijacked region depends on the network topology, $w_a$, $\psi$ and $N$ -- e.g., increasing the number of benign agents $N$ increases robustness across all network topologies as it extends the safe region (see also Corollaries \ref{['coro:cond_takeover_leaf_limit']} -- \ref{['coro:cond_takeover_hub_limit']}).
  • Figure 4: Empirical belief updates by LLM agents align with predictions from the theoretical FJ model in both descriptive and predictive settings. Examples show belief trajectories in 10-round deliberation for Gemini-3-Flash, and ToolBench. Left: Descriptive fit on all 10 round beliefs for Question 90 under star topology, with a hub attacker (theory: Equation \ref{['eq:dyn_star1']}) and benign agents in the leaves (theory: Equation \ref{['eq:dyn_star2']}). Right: Fixed and incremental predictions for beliefs in rounds 8-10 for Question 64 under fully-connected topology (theory: Equation \ref{['eq:dyn_fc']}). In both examples benign agents shift toward the attacker’s false belief in option A, and the theoretical model accurately captures the observed dynamics and predicts later-round beliefs.
  • Figure 5: Attack success rates (ASR) for different network topologies. We show the ASR for each LLM family across fully connected and star network topologies averaged over all traits. For the star network we consider an hub and a leaf attacker. Star attackers are the most effective while leaf attackers are the least effective, as predicted by our theoretical results.
  • ...and 5 more figures

Theorems & Definitions (31)

  • Proposition 4.1: General Case, norris1998markov
  • Remark 4.2
  • Proposition 4.3: Exponential Convergence to Consensus for Star Topology.
  • Proposition 4.4: Exponential Convergence to Consensus for Fully-connected Networks
  • Proposition 4.5: Agreeable Agents Get Dominated by Stubborn Agents, friedkin1990social
  • Corollary 4.6: Single Stubborn Agent Steers Consensus
  • Proposition 4.7: Equilibrium Outcomes for Star Network with Stubborn Hub
  • Proposition 4.8: Equilibrium Outcomes for Fully-Connected Network with Stubborn Node
  • Proposition 4.9: Equilibrium Outcomes for Star Network with Stubborn Leaf
  • Proposition 4.10: Consensus Formation
  • ...and 21 more