Table of Contents
Fetching ...

Investigating social alignment via mirroring in a system of interacting language models

Harvey McGuinness, Tianyu Wang, Carey E. Priebe, Hayden Helm

TL;DR

This work addresses how social mirroring influences alignment in systems of interacting language models. It introduces a scalable computational framework where $n$ LLM agents each possess an external knowledge base and interact over a $k$-neighbor network, with mirroring occurring with probability $p$ and alignment measured by embedding distances $D^{(t)}$ between agent outputs; silos are tracked using $S^{(t)}$ and $E^{(t)}$ across $T$ steps. The main contributions are the identification and characterization of three silo patterns (Stable, Unstable, Decaying) and a systematic analysis of how $p$ and $k$ modulate global consensus and fragmentation, linking observed dynamics to human social phenomena like polarization and echo chambers. This framework provides a tractable, interpretable platform for studying macroscopic alignment dynamics in AI-augmented social systems and informs how interaction structure and mirroring rate can shape collective behavior.

Abstract

Alignment is a social phenomenon wherein individuals share a common goal or perspective. Mirroring, or mimicking the behaviors and opinions of another individual, is one mechanism by which individuals can become aligned. Large scale investigations of the effect of mirroring on alignment have been limited due to the scalability of traditional experimental designs in sociology. In this paper, we introduce a simple computational framework that enables studying the effect of mirroring behavior on alignment in multi-agent systems. We simulate systems of interacting large language models in this framework and characterize overall system behavior and alignment with quantitative measures of agent dynamics. We find that system behavior is strongly influenced by the range of communication of each agent and that these effects are exacerbated by increased rates of mirroring. We discuss the observed simulated system behavior in the context of known human social dynamics.

Investigating social alignment via mirroring in a system of interacting language models

TL;DR

This work addresses how social mirroring influences alignment in systems of interacting language models. It introduces a scalable computational framework where LLM agents each possess an external knowledge base and interact over a -neighbor network, with mirroring occurring with probability and alignment measured by embedding distances between agent outputs; silos are tracked using and across steps. The main contributions are the identification and characterization of three silo patterns (Stable, Unstable, Decaying) and a systematic analysis of how and modulate global consensus and fragmentation, linking observed dynamics to human social phenomena like polarization and echo chambers. This framework provides a tractable, interpretable platform for studying macroscopic alignment dynamics in AI-augmented social systems and informs how interaction structure and mirroring rate can shape collective behavior.

Abstract

Alignment is a social phenomenon wherein individuals share a common goal or perspective. Mirroring, or mimicking the behaviors and opinions of another individual, is one mechanism by which individuals can become aligned. Large scale investigations of the effect of mirroring on alignment have been limited due to the scalability of traditional experimental designs in sociology. In this paper, we introduce a simple computational framework that enables studying the effect of mirroring behavior on alignment in multi-agent systems. We simulate systems of interacting large language models in this framework and characterize overall system behavior and alignment with quantitative measures of agent dynamics. We find that system behavior is strongly influenced by the range of communication of each agent and that these effects are exacerbated by increased rates of mirroring. We discuss the observed simulated system behavior in the context of known human social dynamics.

Paper Structure

This paper contains 6 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Illustration of simulated system dynamics with $n = 7$ agents with a communication range of $k = 3$. Dashed arrows connect agents in the communication range of the receiving agent. Solid arrows connect the agent that interacted with the receiving agent. Arrow color represents the type of interaction -- if it is the color of the receiving agent then communicating agent is mirroring, otherwise they are not mirroring. Mirroring occurs with probability $p$ for each interaction. The opinion of an agent and the agents in its range of communication may change at time $t + 1$ based off its interaction at time $t$. For example, the blue arrow from the light-blue male to the orange female at $t$ affects the opinion of the orange female at $t + 1$ -- she becomes light blue and the agents in her communication range change. See Section \ref{['sec:mechanics']} for further details.
  • Figure 3: Example systems of $n = 30$ with different agent behaviors. Graph color (red/blue/green/orange) indicates the pattern of stable/decaying/unstable/one silo(s) at $T = 80$. Small range of communication for each agent ($k$) appears to prohibit global alignment. Large likelihood of mirroring ($p$) delays global alignment. We investigate these relationships further in Figures \ref{['fig: count_vs_k']} and \ref{['fig: count_vs_p']}.
  • Figure 4: Number of Silos vs. $k$ (the range of communication) for several values of $p$ (the likelihood of an agent interacting with a mirroring agent). For each setting of agent behavior we include the number of silos observed at $T = 80$ for $8$ random agent initializations. Dot color corresponds to system type at $T = 80$. The black line is the average number of silos with the shaded area representing $\pm ~ 3$ standard errors. Up to a point, increasing $k$ encourages global alignment. When $k$ is large the system is more likely to contain multiple silos at $T = 80$.
  • Figure 5: Number of Silos vs. $p$ (the likelihood of an agent interacting with a mirroring agent) for several values of $k$ (the range of communication). For each setting of agent behavior we include the number of silos observed at $T = 80$ for $8$ random agent initializations. Dot color corresponds to system type at $T = 80$. The black line is the average number of silos with the shaded area representing $\pm ~ 3$ standard errors. Increasing $p$ decreases the likelihood of global consensus across all $k$.
  • Figure 6: Example system with $p=0.2$, $k=29$, $T=160$. The system has unstable silos at $T = 80$ and a single silo for $T > 105$ -- indicating that longitudinal analysis may provide additional insights into system behavior.