Table of Contents
Fetching ...

ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems

Jinnuo Liu, Chuke Liu, Hua Shen

TL;DR

ValueFlow presents a perturbation-based framework to quantify how value perturbations propagate in multi-agent LLM systems. It combines a 56-value SVS-derived measurement, a DAG-based interaction model, and two levels of metrics, $\beta$-susceptibility and $SS$, to disentangle agent-level responsiveness from system-wide propagation. Across diverse backbones, prompts, values, and topologies, the study reveals substantial value- and topology-dependent variability in susceptibility and diffusion, with topology shaping both reach and attenuation of perturbations. The findings motivate value- and topology-aware defense strategies and system designs to enhance safety and robustness in collaborative AI systems.

Abstract

Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based evaluation framework for measuring and analyzing value drift in multi-agent systems. ValueFlow introduces a 56-value evaluation dataset derived from the Schwartz Value Survey and quantifies agents' value orientations during interaction using an LLM-as-a-judge protocol. Building on this measurement layer, ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, operationalized by two metrics: beta-susceptibility, which measures an agent's sensitivity to perturbed peer signals, and system susceptibility (SS), which captures how node-level perturbations affect final system outputs. Experiments across multiple model backbones, prompt personas, value dimensions, and network structures show that susceptibility varies widely across values and is strongly shaped by structural topology.

ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems

TL;DR

ValueFlow presents a perturbation-based framework to quantify how value perturbations propagate in multi-agent LLM systems. It combines a 56-value SVS-derived measurement, a DAG-based interaction model, and two levels of metrics, -susceptibility and , to disentangle agent-level responsiveness from system-wide propagation. Across diverse backbones, prompts, values, and topologies, the study reveals substantial value- and topology-dependent variability in susceptibility and diffusion, with topology shaping both reach and attenuation of perturbations. The findings motivate value- and topology-aware defense strategies and system designs to enhance safety and robustness in collaborative AI systems.

Abstract

Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based evaluation framework for measuring and analyzing value drift in multi-agent systems. ValueFlow introduces a 56-value evaluation dataset derived from the Schwartz Value Survey and quantifies agents' value orientations during interaction using an LLM-as-a-judge protocol. Building on this measurement layer, ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, operationalized by two metrics: beta-susceptibility, which measures an agent's sensitivity to perturbed peer signals, and system susceptibility (SS), which captures how node-level perturbations affect final system outputs. Experiments across multiple model backbones, prompt personas, value dimensions, and network structures show that susceptibility varies widely across values and is strongly shaped by structural topology.
Paper Structure (54 sections, 6 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 54 sections, 6 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustrative examples of value perturbation outcomes in multi-agent systems. For some values, injected perturbations fail to propagate and the system remains stable. For others, perturbations spread through agent interaction and lead to system-level value shift.
  • Figure 2: Overview of the ValueFlow framework. The framework (i) models multi-agent interactions and quantifies agent-level value orientations; (ii) introduces controlled value perturbations; and (iii) measures value propagation using two metrics: agent-level susceptibility ($\beta$) and system-level susceptibility (SS).
  • Figure 3: Value-wise agent-level $\beta$-susceptibility under a fixed agent configuration (Qwen3-8B, neutral openness persona). Values are sorted by their $\beta$ scores. The distribution reveals substantial variation across value dimension.
  • Figure 4: Agent-level $\beta$-susceptibility across value dimensions under different openness persona prompts (Qwen3-8B). Each bar corresponds to one value and three colors show $\beta$ under high-, neutral-, and low-openness personas. While high-openness generally increases susceptibility, the magnitude of this effect varies across values.
  • Figure 5: Distribution of agent-level $\beta$-susceptibility under high and low input context variance (Qwen3-8B, neutral openness persona). Each box summarizes $\beta$ values across all 56 value dimensions.
  • ...and 4 more figures