Tracking the perspectives of interacting language models

Hayden Helm; Brandon Duderstadt; Youngser Park; Carey E. Priebe

Tracking the perspectives of interacting language models

Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe

TL;DR

This work models systems of interacting LLMs as a directed graph $G=(V,E)$ with vertices $V=\mathcal{F}\cup\mathcal{D}$ and time-varying edges $E^{(t)}$ to study information diffusion. It introduces a perspective space built from a surrogate data kernel and CMDS to quantify model-wise differences in responses to a fixed prompt set $\mathbf{X}$, enabling comparative analyses across heterogeneous models. Three case studies illustrate how different communication structures drive phenomena such as global and local sinks, adversarial influence and diffusion, and cross-class polarization, with metrics including iso-mirror, ARI, and polarization. The approach provides a quantitative framework for analyzing AI ecosystems and their analogs in human-model forums, offering insights into interventions and system health while acknowledging simplifications and avenues for broader sociotechnical validation.

Abstract

Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of a communication network of LLMs and introduce a method for representing the perspective of individual models within a collection of LLMs. Given these tools we systematically study information diffusion in the communication network of LLMs in various simulated settings.

Tracking the perspectives of interacting language models

TL;DR

This work models systems of interacting LLMs as a directed graph

with vertices

and time-varying edges

to study information diffusion. It introduces a perspective space built from a surrogate data kernel and CMDS to quantify model-wise differences in responses to a fixed prompt set

, enabling comparative analyses across heterogeneous models. Three case studies illustrate how different communication structures drive phenomena such as global and local sinks, adversarial influence and diffusion, and cross-class polarization, with metrics including iso-mirror, ARI, and polarization. The approach provides a quantitative framework for analyzing AI ecosystems and their analogs in human-model forums, offering insights into interventions and system health while acknowledging simplifications and avenues for broader sociotechnical validation.

Abstract

Paper Structure (13 sections, 2 equations, 6 figures)

This paper contains 13 sections, 2 equations, 6 figures.

Introduction
A communication network of LLMs
Defining a perspective space with surrogate data kernels
The data kernel & its surrogate
The perspective space
Simulating systems of interacting LLMs
Related Work
Conclusion
Limitations
Instruction-tuning Pythia-410m-deduped
Case-study specific fine-tuning
Case Study 1: Stochastically Equivalent Models
Case Studies 2 & 3: Two classes

Figures (6)

Figure 1: Examples of communication networks of language models and databases. The edge structure and model intitializations directly impact the evolution of the perspectives of the models and the overall health of the system.
Figure 2: Two 2-d perspective spaces of fifteen models (5 models each from three classes, encoded by color). An evaluation set containing prompts relevant to the differences in the models (left) is better suited to induce a discriminative perspective space than an evaluation set containing "orthogonal" prompts.
Figure 3: Tracking individual perspective (left) and system-level dynamics (right) of communication networks of chat-based language models with (bottom left) and without (top left) a disruption in communication structure.
Figure 4: Estimated number of clusters found via GMM with BIC (top) and sequential ARI of cluster labels (bottom) for disrupted and undisrupted systems. The number of clusters in both systems stabilize, indicating the presence of model sinks. Model sinks are unstable in a system with no disruption and stable in a system with a disruption.
Figure 5: The evolution of 1-d perspectives of five interacting models where two models interact with an "adversarial" model every other interaction (top). Given enough nodes to influence, the adversarial model can compromise the entire network -- as captured by the difference between the average 1-d perspective of the non-adversarial models and the 1-d perspective of the adversarial model for various amounts of target models and various attack frequencies (bottom).
...and 1 more figures

Tracking the perspectives of interacting language models

TL;DR

Abstract

Tracking the perspectives of interacting language models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)