Table of Contents
Fetching ...

Emergent social conventions and collective bias in LLM populations

Ariel Flint Ashery, Luca Maria Aiello, Andrea Baronchelli

TL;DR

Experimental results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.

Abstract

Social conventions are the backbone of social coordination, shaping how individuals form a group. As growing populations of artificial intelligence (AI) agents communicate through natural language, a fundamental question is whether they can bootstrap the foundations of a society. Here, we present experimental results that demonstrate the spontaneous emergence of universally adopted social conventions in decentralized populations of large language model (LLM) agents. We then show how strong collective biases can emerge during this process, even when agents exhibit no bias individually. Last, we examine how committed minority groups of adversarial LLM agents can drive social change by imposing alternative social conventions on the larger population. Our results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.

Emergent social conventions and collective bias in LLM populations

TL;DR

Experimental results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.

Abstract

Social conventions are the backbone of social coordination, shaping how individuals form a group. As growing populations of artificial intelligence (AI) agents communicate through natural language, a fundamental question is whether they can bootstrap the foundations of a society. Here, we present experimental results that demonstrate the spontaneous emergence of universally adopted social conventions in decentralized populations of large language model (LLM) agents. We then show how strong collective biases can emerge during this process, even when agents exhibit no bias individually. Last, we examine how committed minority groups of adversarial LLM agents can drive social change by imposing alternative social conventions on the larger population. Our results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.

Paper Structure

This paper contains 36 sections, 15 figures, 9 tables.

Figures (15)

  • Figure 1: The spontaneous emergence of conventions. (A) The success rate---i.e., the probability of observing a success at a given time---for population size $N=24$ and a name pool of size $W=10$, for each of the four models. Thick lines represent average curves obtained from 40 experimental runs, while thin lines are representative individual runs. To improve visibility, we only show 5 individual trajectories for each LLM. The black, dashed line shows the success rate of the theoretical minimal naming game model, averaged over 10,000 runs under the same constraints. (B) Word competition in a single run in a population of Llama-3.1-70B-Instruct agents. Different markers and colors represent the trajectories of unique conventions. Each data point is a bin averaging the past interactions up until the preceding bin boundary. Error bars indicate standard error of the mean.
  • Figure 2: Collective Bias in Convention Selection. (A) Distribution of consensus conventions, for a name pool of size $W=10$ ($N=24$). Results of 40 runs for the Llama-3-70B-Instruct and Llama-3.1-70B-Instruct models, and 27 and 20 runs for Claude-3.5-Sonnet and Llama-2-70b-Chat, respectively. The collective dynamics systematically amplify individual biases (shown in Fig. \ref{['fig: ten individual bias']}). (B), Individual vs Collective bias for $W=2$, name pool $\{Q, M\}$. Left panel: probability of selecting either convention for agents with no prior memory ($Q$: lighter hue, $M$: darker hue). Raw values reported in Table \ref{['tab:2A data']}. Asterisks (*) indicates that there is insufficient evidence to reject the null hypothesis that the model is unbiased at the 5% significance level (calculated using an exact Binomial test from 10,000 samples per model, apart from Llama-3-70B-Instruct which had 5,000 samples, see Materials and Methods). Corresponding p-values for the models (from left to right) are $p$ = 0.068, 0.116, 0.757, and 0.849. Right: the proportion of runs ($40$) that resulted in consensus on the respective convention. Raw values reported in Table \ref{['tab:2B data']}.
  • Figure 3: Committed minority and critical mass dynamics. Populations of $N = 24$ agents ($N = 48$ for Llama-3-70B-Instruct) were initialized in two conditions, with complete consensus on either the weak ($Q$) or strong ($M$) convention ($W = 2$). Each agent's memory exclusively stored one convention in each setting, with memory length $H = 5$ ($H = 3$ for Llama-3-70B-Instruct). (A) The average probability of producing the alternative convention when the majority holds the weak (top) or strong (bottom) convention. The legend shows the size of the committed minority (CM). Bold (faint) lines represent the production probability when the CM reaches the critical mass needed to flip the majority on the strong (weak) convention. Solid lines with filled circles indicate that all trials achieved population consensus on the alternative convention (95% success rate in the past $3N$ rounds). (B) Critical mass needed to flip the majority for each model. Raw values reported in Table \ref{['tab:3B data']}. Error bars indicate standard error of the mean.
  • Figure S1: Meta-prompting results Accuracy of the model responses to the prompt comprehension questions defined in Table \ref{['tab:meta-prompting']}. We selected 8 agents from a single run (5 agents for Llama-3-70B-Instruct), and recovered their game record. We replayed the game using the memory length used in the simulated run ($H=5$), posing the comprehension questions at each interaction. These runs provide approximately $100$ test interactions for each model.
  • Figure S2: Robustness of the Spontaneous emergence of conventions. We show that the spontaneous emergence of conventions holds for a variety of simulation parameters, using populations of Llama-3-70B-Instruct agents. $W=26$ indicates a name pool which uses the entire Latin alphabet, $W=6$ is the name pool $\{Q, M, F, J, X, Y\}$, and $W=2$ is the name pool $\{Q, M\}.$
  • ...and 10 more figures