ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems
Jinnuo Liu, Chuke Liu, Hua Shen
TL;DR
ValueFlow presents a perturbation-based framework to quantify how value perturbations propagate in multi-agent LLM systems. It combines a 56-value SVS-derived measurement, a DAG-based interaction model, and two levels of metrics, $\beta$-susceptibility and $SS$, to disentangle agent-level responsiveness from system-wide propagation. Across diverse backbones, prompts, values, and topologies, the study reveals substantial value- and topology-dependent variability in susceptibility and diffusion, with topology shaping both reach and attenuation of perturbations. The findings motivate value- and topology-aware defense strategies and system designs to enhance safety and robustness in collaborative AI systems.
Abstract
Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based evaluation framework for measuring and analyzing value drift in multi-agent systems. ValueFlow introduces a 56-value evaluation dataset derived from the Schwartz Value Survey and quantifies agents' value orientations during interaction using an LLM-as-a-judge protocol. Building on this measurement layer, ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, operationalized by two metrics: beta-susceptibility, which measures an agent's sensitivity to perturbed peer signals, and system susceptibility (SS), which captures how node-level perturbations affect final system outputs. Experiments across multiple model backbones, prompt personas, value dimensions, and network structures show that susceptibility varies widely across values and is strongly shaped by structural topology.
