Table of Contents
Fetching ...

Aspects of human memory and Large Language Models

Romuald A. Janik

TL;DR

The paper investigates whether Large Language Models exhibit memory properties akin to human memory by viewing memory as a functional aspect of the probabilistic next-token model $P(\text{token}|\text{preceding text})$. Through experiments that adapt serial-memory tasks to LLMs (using has-a, is-a, lives-in facts) and evaluating recall via noun probability rankings, the authors show human-like primacy and recency effects, memory enhancement from elaborations, and forgetting dominated by interference, with a notable LLM-specific memory formation time. The findings suggest these memory-like behaviors emerge from the statistics of the training data rather than an explicit memory subsystem, supporting the view that linguistic structure and human memory imprint one another. This has implications for cognitive science and AI, indicating a close interplay between biological memory effects and statistical language structure in shaping narrative coherence and memory phenomena.

Abstract

Large Language Models (LLMs) are huge artificial neural networks which primarily serve to generate text, but also provide a very sophisticated probabilistic model of language use. Since generating a semantically consistent text requires a form of effective memory, we investigate the memory properties of LLMs and find surprising similarities with key characteristics of human memory. We argue that the human-like memory properties of the Large Language Model do not follow automatically from the LLM architecture but are rather learned from the statistics of the training textual data. These results strongly suggest that the biological features of human memory leave an imprint on the way that we structure our textual narratives.

Aspects of human memory and Large Language Models

TL;DR

The paper investigates whether Large Language Models exhibit memory properties akin to human memory by viewing memory as a functional aspect of the probabilistic next-token model . Through experiments that adapt serial-memory tasks to LLMs (using has-a, is-a, lives-in facts) and evaluating recall via noun probability rankings, the authors show human-like primacy and recency effects, memory enhancement from elaborations, and forgetting dominated by interference, with a notable LLM-specific memory formation time. The findings suggest these memory-like behaviors emerge from the statistics of the training data rather than an explicit memory subsystem, supporting the view that linguistic structure and human memory imprint one another. This has implications for cognitive science and AI, indicating a close interplay between biological memory effects and statistical language structure in shaping narrative coherence and memory phenomena.

Abstract

Large Language Models (LLMs) are huge artificial neural networks which primarily serve to generate text, but also provide a very sophisticated probabilistic model of language use. Since generating a semantically consistent text requires a form of effective memory, we investigate the memory properties of LLMs and find surprising similarities with key characteristics of human memory. We argue that the human-like memory properties of the Large Language Model do not follow automatically from the LLM architecture but are rather learned from the statistics of the training textual data. These results strongly suggest that the biological features of human memory leave an imprint on the way that we structure our textual narratives.
Paper Structure (5 sections, 7 equations, 9 figures)

This paper contains 5 sections, 7 equations, 9 figures.

Figures (9)

  • Figure 1: Recall accuracy for a serial memory experiment with human subjects (sample data from glanzercunitz) and for a memorization experiment of a list of 20 facts of the has-a type for the Large Language Model GPT-Jgptj studied extensively in this paper. The observed U-shaped curves exhibit the primacy and recency effects.
  • Figure 2: Probing memory in GPT-J. The list of facts is separated from the query by some intervening text. The large language model GPT-J computes the probabilities of tokens which could be put instead of the X placeholder in the query. We take into account only the tokens corresponding to nouns. The answer is judged as correct if the highest ranking noun is identical to the one given for Paul in the list of facts.
  • Figure 3: Recall accuracy as a function of position for lists of various lengths (top) and for facts of various types (bottom).
  • Figure 4: Recall accuracy as a function of position for a list of 20 facts with a has-a relationship for a variety of Pythia-family language models of different sizes.
  • Figure 5: Comparison of recall for a baseline list of facts and with elaborations added at positions marked with red arrows.
  • ...and 4 more figures