An iterated learning model of language change that mixes supervised and unsupervised learning

Jack Bunyan; Seth Bullock; Conor Houghton

An iterated learning model of language change that mixes supervised and unsupervised learning

Jack Bunyan, Seth Bullock, Conor Houghton

TL;DR

A linear relationship between the dimensionality of meaning-signal space and effective bottleneck size is demonstrated and it is suggested that internal reflection on potential utterances is important in language learning and language evolution.

Abstract

The iterated learning model is an agent model which simulates the transmission of of language from generation to generation. It is used to study how the language adapts to pressures imposed by transmission. In each iteration, a language tutor exposes a naïve pupil to a limited training set of utterances, each pairing a random meaning with the signal that conveys it. Then the pupil becomes a tutor for a new naïve pupil in the next iteration. The transmission bottleneck ensures that tutors must generalize beyond the training set that they experienced. Repeated cycles of learning and generalization can result in a language that is expressive, compositional and stable. Previously, the agents in the iterated learning model mapped signals to meanings using an artificial neural network but relied on an unrealistic and computationally expensive process of obversion to map meanings to signals. Here, both maps are neural networks, trained separately through supervised learning and together through unsupervised learning in the form of an autoencoder. This avoids the computational burden entailed in obversion and introduces a mixture of supervised and unsupervised learning as observed during language learning in children. The new model demonstrates a linear relationship between the dimensionality of meaning-signal space and effective bottleneck size and suggests that internal reflection on potential utterances is important in language learning and evolution.

An iterated learning model of language change that mixes supervised and unsupervised learning

TL;DR

Abstract

Paper Structure (1 section, 14 equations, 12 figures, 1 table)

This paper contains 1 section, 14 equations, 12 figures, 1 table.

Semi-Supervised ILM

Figures (12)

Figure 1: A guide to the notation. A: A meaning is an ordered sequence of $n$ facts, $m_1$ thru $m_n$. A signal is an ordered sequence of $n$ words, $s_1$ thru $s_n$ with $n=6$ in this example. The use of word is potentially confusing since a word is sometimes thought of as a sequence of letters, but here it is an indivisible component of the signal. A word corresponds to a single bit and a signal can be thought of as corresponding to a phrase; in the same way a meaning can be thought of as corresponding to a state of the world, composed of a set of facts. B: This is an example $n=3$ language. This is much smaller than the $n$ values actually simulated here but is convenient for illustration. The example here is fully expressive: every possible meaning maps to a different signal, and it is fully compositional: each fact fully determines a unique word. Here, $s_1=\neg m_3$, $s_2= m_1$ and $s_3=\neg m_2$. For illustrative convenience the $m_2$ and $s_3$ elements involved in the last of these equivalences have been colored orange. In the example in C the decoder map $d$ maps the signal $(1,0,\ldots,1)$ to the meaning $(0,1,\ldots,1)$; the decoder is made up of two parts, the neural network $\hat{d}$ with one hidden layer the same size as the input and output, and the decision map $\delta$ which maps probabilities to zero or one.
Figure 2: Training the Obverter ILM. Each agent, $A_i$, first trains its decoder during a period of supervised learning on a set of meaning-signal pairs, ${{\cal{B}}_i}$, provided by its tutor, $A_{i-1}$. Subsequently, the pupil (red) derives an encoder from its decoder using a process of obversion, $O$, and itself is promoted to become a tutor (blue) to a new pupil agent, $A_{i+1}$.
Figure 3: Training the Semi-Supervised ILM. Each agent, $A_i$, trains their encoder and decoder during a period combining supervised learning on a set of meaning-signal pairs, $\mathcal{B}_i$, provided by its tutor $A_{i+1}$ and unsupervised autoencoder learning on a set of example meanings, $\mathcal{A}_i$ drawn from $\mathcal{M}$. Subsequently, the pupil (red) is promoted to become a tutor (blue) to a new pupil agent, $A_{i+1}$. This is represented by a dashed line because, unlike for the Obverter ILM, this promotion is only a change of role, it does not involve any further work for the agent since its encoder is already trained.
Figure 4: The Semi-Supervised ILM. A: An encoder, $\hat{e}$, maps $\mathcal{M}$ to $\mathcal{S}$. B: A decoder, $\hat{d}$, maps $\mathcal{S}$ back to $\mathcal{M}$. C: An autoencoder, $\hat{a}$, maps $\mathcal{M}$ to $\mathcal{M}'$, by chaining $\hat{e}$ and $\hat{d}$. The output of each network is a vector of probabilities (denoted $p_i$ or $q_i$), which are thresholded by a function $\delta$ to deliver a binary vector, in the autoencoder, C, the signal layer is now one of three hidden layers with each node having a value between zero and one.
Figure 5: The Semi-Supervised ILM evolves a stable, expressive, compositional language. This figure describes the performance of the Semi-Supervised ILM but, for comparison, A-C shows the performance of the Obverter ILM. Expressivity ($x$), compositionality ($c$) and stability ($s$) as a function of generation for 25 independent replicates of the $n=8$ Obverter ILM; the thick line is the mean, the thin lines show the individual replicates and the bottleneck size in all cases is 50. D through to I relate to the $n=8$ Semi-Supervised ILM. D-F plot respectively expressivity ($x$), compositionality ($c$) and stability ($s$) as a function of generation for 25 independent replicates of the $n=8$ Semi-Supervised ILM; the thick line is the mean, the thin lines show the individual replicates and the bottleneck size in all cases is 75; the autoencoder is trained using the same meanings as appear in the meaning-signal pairs used to train the decoder and encoder. In G-I$\mathcal{A}$ is selected independently to $\mathcal{B}$ and $|\mathcal{A}|=225$.
...and 7 more figures

An iterated learning model of language change that mixes supervised and unsupervised learning

TL;DR

Abstract

An iterated learning model of language change that mixes supervised and unsupervised learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)