Table of Contents
Fetching ...

Gaussian mixture models as a proxy for interacting language models

Edward L. Wang, Mohammad Sharifi Kiasari, Tianyu Wang, Hayden Helm, Avanti Athreya, Carey Priebe, Vince Lyzinski

Abstract

Large language models (LLMs) are powerful tools that, in a number of settings, overlap with the results of human pattern recognition and reasoning. Retrieval-augmented generation (RAG) further allows LLMs to produce tailored output depending on the contents of their RAG databases. However, LLMs depend on complex, computationally expensive algorithms. In this paper, we introduce interacting Gaussian mixture models (GMMs) as a proxy for interacting LLMs. We construct a model of interacting GMMs, complete with an analogue to RAG updating, under which GMMs can generate, exchange, and update data and parameters. We show that this interacting system of Gaussian mixture models, which can be implemented at minimal computational cost, mimics certain aspects of experimental simulations of interacting LLMs whose iterative responses depend on feedback from other LLMs. We build a Markov chain from this system of interacting GMMs; formalize and interpret the notion of polarization for such a chain; and prove lower bounds on the probability of polarization. This provides theoretical insight into the use of interacting Gaussian mixture models as a computationally efficient proxy for interacting large language models.

Gaussian mixture models as a proxy for interacting language models

Abstract

Large language models (LLMs) are powerful tools that, in a number of settings, overlap with the results of human pattern recognition and reasoning. Retrieval-augmented generation (RAG) further allows LLMs to produce tailored output depending on the contents of their RAG databases. However, LLMs depend on complex, computationally expensive algorithms. In this paper, we introduce interacting Gaussian mixture models (GMMs) as a proxy for interacting LLMs. We construct a model of interacting GMMs, complete with an analogue to RAG updating, under which GMMs can generate, exchange, and update data and parameters. We show that this interacting system of Gaussian mixture models, which can be implemented at minimal computational cost, mimics certain aspects of experimental simulations of interacting LLMs whose iterative responses depend on feedback from other LLMs. We build a Markov chain from this system of interacting GMMs; formalize and interpret the notion of polarization for such a chain; and prove lower bounds on the probability of polarization. This provides theoretical insight into the use of interacting Gaussian mixture models as a computationally efficient proxy for interacting large language models.

Paper Structure

This paper contains 20 sections, 7 theorems, 29 equations, 10 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

There is a constant $q=q_{m,\rho,r}>0$ such that holds for any $\{(w_i^{(t)},x_{1i}^{(t)},\cdots,x_{ri}^{(t)})\}_{i=1}^m$ in the GMM MC state space. $\blacktriangleleft$$\blacktriangleleft$

Figures (10)

  • Figure 1: Heatmap visualization of the function $h(w, x)$ across three values of $\sigma$. The function is plotted over the domains $w \in [0, 1]$ and $x \in [-2, 2]$. The transition from the maximum value ($h \approx 1$, yellow) to the minimum ($h \approx 0$, dark purple) occurs at $x=0$. As $\sigma$ decreases from $0.5$ to $0.1$, the gradient sharpens significantly, causing the function to approach a step-function behavior.
  • Figure 2: Comparison of unstable silo behavior for our GMM simulation and the LLM simulation from mcguinness2024investigatingLeft. Unstable silo behavior for the GMM model described in this paper. We set the global parameters to be $p=0.4$, $k=29$, $r=5$, and $T=100$. Right. Unstable silo behavior for the LLM model in mcguinness2024investigatingTop. An example unstable silo system where each line represents an agent. Bottom. The evolution of the number of agents in each possible silo where each line represents a silo.
  • Figure 3: Comparison of the effect of $k$ on the number of silos for $p=0$ between the GMM and LLM simulations. Left. The result of the GMM simulation with $T=80$, and $r=5$. Each value of $k$ was simulated 50 times with the line indicating the average and the shaded region indicating +/- 5 SE. Right. The result of the LLM simulation from Figure 4 of mcguinness2024investigating
  • Figure 4: GMM System. Each plot has the time from $t = 0$ to $200$ on the x-axis and the silo of the agent on the y-axis. We plot example systems of $n=30$ interacting agents for varying values of $p$ and $k$ the GMM interaction system. The value of $p$ is constant for each row and the value of $k$ is constant for each column.
  • Figure 5: LLM System. Each plot has the time from $t = 0$ to $80$ on the x-axis and the silo of the agent on the y-axis. We plot example systems of $n=30$ interacting agents for varying values of $p$ and $k$ the LLM interaction system. The value of $p$ is constant for each row and the value of $k$ is constant for each column.
  • ...and 5 more figures

Theorems & Definitions (15)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Definition 1
  • Theorem 1
  • Remark 5
  • Theorem 2
  • Definition 2
  • Proposition 1
  • ...and 5 more