Table of Contents
Fetching ...

Bias Amplification in Language Model Evolution: An Iterated Learning Perspective

Yi Ren, Shangmin Guo, Linlu Qiu, Bailin Wang, Danica J. Sutherland

TL;DR

Key characteristics of agents' behavior in the Bayesian-IL framework are outlined, including predictions that are supported by experimental verification with various LLMs that could help to more effectively predict and guide the evolution of LLMs in desired directions.

Abstract

With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new examples for training subsequent models. At the same time, multi-agent LLM systems, involving automated interactions among agents, are also increasing in prominence. Thus, in both short and long terms, LLMs may actively engage in an evolutionary process. We draw parallels between the behavior of LLMs and the evolution of human culture, as the latter has been extensively studied by cognitive scientists for decades. Our approach involves leveraging Iterated Learning (IL), a Bayesian framework that elucidates how subtle biases are magnified during human cultural evolution, to explain some behaviors of LLMs. This paper outlines key characteristics of agents' behavior in the Bayesian-IL framework, including predictions that are supported by experimental verification with various LLMs. This theoretical framework could help to more effectively predict and guide the evolution of LLMs in desired directions.

Bias Amplification in Language Model Evolution: An Iterated Learning Perspective

TL;DR

Key characteristics of agents' behavior in the Bayesian-IL framework are outlined, including predictions that are supported by experimental verification with various LLMs that could help to more effectively predict and guide the evolution of LLMs in desired directions.

Abstract

With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new examples for training subsequent models. At the same time, multi-agent LLM systems, involving automated interactions among agents, are also increasing in prominence. Thus, in both short and long terms, LLMs may actively engage in an evolutionary process. We draw parallels between the behavior of LLMs and the evolution of human culture, as the latter has been extensively studied by cognitive scientists for decades. Our approach involves leveraging Iterated Learning (IL), a Bayesian framework that elucidates how subtle biases are magnified during human cultural evolution, to explain some behaviors of LLMs. This paper outlines key characteristics of agents' behavior in the Bayesian-IL framework, including predictions that are supported by experimental verification with various LLMs. This theoretical framework could help to more effectively predict and guide the evolution of LLMs in desired directions.
Paper Structure (32 sections, 4 theorems, 19 equations, 19 figures, 5 tables)

This paper contains 32 sections, 4 theorems, 19 equations, 19 figures, 5 tables.

Key Result

Proposition 1

Consider several Bayesian agents sharing the same prior $P_0(h)$ are conducting iterated learning for $T$ generations. If $T$ is sufficiently large, any agent$_t$ with $t>T$ will have where $h^{T*}$ is a stationary point (e.g. a local maximum) of $P_0(h)$ subject to $h \in \mathcal{H}_{\text{eff}}$.

Figures (19)

  • Figure 1: Examples of practical LLM systems that require knowledge transfer among different generations and how we use Bayesian agents to approximate their behaviors. ①, ②, and ③ denotes the imitation, interaction and transmission phases respectively.
  • Figure 2: Demonstration of conducting iterated ICL on the ACRE task.
  • Figure 3: Left: the mean and standard deviations of $H(P_{lmw}(h))$ of experiments with different $h^*$ and $\bm{\mathsf{d}}^0$ (5 different seeds). Middle two: the probability of screen being off, where different colors represent six different levels of spurious bias. Right: the histogram of all $P_{lmw}(h)$ in the first and sixth generation, where the bars are colored based on the value of the last object in $h$.
  • Figure 4: Leftmost three: experiments in \ref{['sec:applications']}. First: how the ratio of easy samples changes in $\bm{\mathsf{d}}^t$. $N_e$ is the number of easy examples in $\bm{\mathsf{d}}^0$. Second: how the average ranking of acronyms changes. Third: how the average length of acronyms changes. Rightmost two: results of on-policy DPO in \ref{['sec:applications_2']}. Fourth: average length of the responses. Fifth: win rate against the SFT baseline.
  • Figure 5: Illustrations of typical EM algorithm and an imitation-only iterated learning method.
  • ...and 14 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Proposition 2
  • proof
  • Proposition 2
  • proof