Investigating social alignment via mirroring in a system of interacting language models

Harvey McGuinness; Tianyu Wang; Carey E. Priebe; Hayden Helm

Investigating social alignment via mirroring in a system of interacting language models

Harvey McGuinness, Tianyu Wang, Carey E. Priebe, Hayden Helm

TL;DR

This work addresses how social mirroring influences alignment in systems of interacting language models. It introduces a scalable computational framework where $n$ LLM agents each possess an external knowledge base and interact over a $k$-neighbor network, with mirroring occurring with probability $p$ and alignment measured by embedding distances $D^{(t)}$ between agent outputs; silos are tracked using $S^{(t)}$ and $E^{(t)}$ across $T$ steps. The main contributions are the identification and characterization of three silo patterns (Stable, Unstable, Decaying) and a systematic analysis of how $p$ and $k$ modulate global consensus and fragmentation, linking observed dynamics to human social phenomena like polarization and echo chambers. This framework provides a tractable, interpretable platform for studying macroscopic alignment dynamics in AI-augmented social systems and informs how interaction structure and mirroring rate can shape collective behavior.

Abstract

Alignment is a social phenomenon wherein individuals share a common goal or perspective. Mirroring, or mimicking the behaviors and opinions of another individual, is one mechanism by which individuals can become aligned. Large scale investigations of the effect of mirroring on alignment have been limited due to the scalability of traditional experimental designs in sociology. In this paper, we introduce a simple computational framework that enables studying the effect of mirroring behavior on alignment in multi-agent systems. We simulate systems of interacting large language models in this framework and characterize overall system behavior and alignment with quantitative measures of agent dynamics. We find that system behavior is strongly influenced by the range of communication of each agent and that these effects are exacerbated by increased rates of mirroring. We discuss the observed simulated system behavior in the context of known human social dynamics.

Investigating social alignment via mirroring in a system of interacting language models

TL;DR

This work addresses how social mirroring influences alignment in systems of interacting language models. It introduces a scalable computational framework where

LLM agents each possess an external knowledge base and interact over a

-neighbor network, with mirroring occurring with probability

and alignment measured by embedding distances

between agent outputs; silos are tracked using

and

across

steps. The main contributions are the identification and characterization of three silo patterns (Stable, Unstable, Decaying) and a systematic analysis of how

and

modulate global consensus and fragmentation, linking observed dynamics to human social phenomena like polarization and echo chambers. This framework provides a tractable, interpretable platform for studying macroscopic alignment dynamics in AI-augmented social systems and informs how interaction structure and mirroring rate can shape collective behavior.

Investigating social alignment via mirroring in a system of interacting language models

TL;DR

Abstract

Investigating social alignment via mirroring in a system of interacting language models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)