Table of Contents
Fetching ...

Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications

Till Speicher, Mohammad Aflah Khan, Qinyuan Wu, Vedant Nanda, Soumi Das, Bishwamittra Ghosh, Krishna P. Gummadi, Evimaria Terzi

TL;DR

This work creates an experimental framework that is based on repeatedly exposing large language models to random strings and identifies factors that make some strings easier to memorise than others, and identifies the role of local prefixes and global context in memorisation.

Abstract

Understanding whether and to what extent large language models (LLMs) have memorised training data has important implications for the reliability of their output and the privacy of their training data. In order to cleanly measure and disentangle memorisation from other phenomena (e.g. in-context learning), we create an experimental framework that is based on repeatedly exposing LLMs to random strings. Our framework allows us to better understand the dynamics, i.e., the behaviour of the model, when repeatedly exposing it to random strings. Using our framework, we make several striking observations: (a) we find consistent phases of the dynamics across families of models (Pythia, Phi and Llama2), (b) we identify factors that make some strings easier to memorise than others, and (c) we identify the role of local prefixes and global context in memorisation. We also show that sequential exposition to different random strings has a significant effect on memorisation. Our results, often surprising, have significant downstream implications in the study and usage of LLMs.

Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications

TL;DR

This work creates an experimental framework that is based on repeatedly exposing large language models to random strings and identifies factors that make some strings easier to memorise than others, and identifies the role of local prefixes and global context in memorisation.

Abstract

Understanding whether and to what extent large language models (LLMs) have memorised training data has important implications for the reliability of their output and the privacy of their training data. In order to cleanly measure and disentangle memorisation from other phenomena (e.g. in-context learning), we create an experimental framework that is based on repeatedly exposing LLMs to random strings. Our framework allows us to better understand the dynamics, i.e., the behaviour of the model, when repeatedly exposing it to random strings. Using our framework, we make several striking observations: (a) we find consistent phases of the dynamics across families of models (Pythia, Phi and Llama2), (b) we identify factors that make some strings easier to memorise than others, and (c) we identify the role of local prefixes and global context in memorisation. We also show that sequential exposition to different random strings has a significant effect on memorisation. Our results, often surprising, have significant downstream implications in the study and usage of LLMs.
Paper Structure (33 sections, 4 equations, 47 figures, 3 tables)

This paper contains 33 sections, 4 equations, 47 figures, 3 tables.

Figures (47)

  • Figure 1: [Recollection accuracy for different alphabet sizes $\ell$ and models ${\mathcal{M}}$. ($n = 1024$)] For all models, the accuracy initially increases quickly before stagnating at the random guess level during the Guessing-Phase. Afterwards, the accuracy converges more slowly towards $1$ during the Memorisation-Phase. The accuracy of randomly guessing tokens from $A$ is shown with dashed lines.
  • Figure 2: [Aggregate probability mass and entropy for different $\ell$. ($n = 1024$)] i) Plots on the top show the probability mass that ${\mathcal{M}}$ assigns to tokens in $A$. In all cases, models quickly learn to allocate the maximum possible probability mass to the tokens within the alphabet $A$, i.e. they only predict tokens from $A$ after a few training epochs. ii) We show the average entropy of the probability distribution of model ${\mathcal{M}}$ over $A$. The entropy initially rises to its maximum value, before decreasing to 0. The maximum attainable entropy (for different $\ell$) is shown with dashed lines.
  • Figure 3: [Recollection accuracy for different entropy levels $h$. ($n = 1024$)] Analogously to strings with different $\ell$, strings with lower $h$ are easier to guess, but harder to memorise. Dashed lines indicate the performance of a random guess, equivalent to always guessing "a".
  • Figure 4: [Recollection accuracy for different prefix lengths and for changes in the global context (GC) during training. ($n = 1024, {\mathcal{M}} = \text{Pythia-1B}$)] (a) and (b) show what fraction of tokens can be recollected correctly with different prefix lengths, at different points during training. In many cases, prefixes much shorter than the full string are sufficient to predict most of the tokens accurately. (c) shows the performance of a randomly re-sampled vs a constant global context with only one repeated token, and (d) shows the impact of changing the size of the global context, where the numbers indicate multiples of the GC size.
  • Figure 5: [Accuracy on different strings during sequential memorisation. ($n = 1024, {\mathcal{M}} = \text{Pythia-1B}$)] Each curve denotes a new string. As the model memorises new strings, it forgets old ones, shown by the drop in accuracy after the first 50 epochs per string.
  • ...and 42 more figures