Table of Contents
Fetching ...

Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Dongyu Gong, Hantao Zhang

TL;DR

Insight is offered into the shared role of attention in both human and artificial intelligence and the limitations of the self-attention mechanism revealed in the current study could inform future efforts to design more powerful model architectures with enhanced working memory capacity and cognitive capabilities.

Abstract

Recent work on Transformer-based large language models (LLMs) has revealed striking limits in their working memory capacity, similar to what has been found in human behavioral studies. Specifically, these models' performance drops significantly on N-back tasks as N increases. However, there is still a lack of mechanistic interpretability as to why this phenomenon would arise. Inspired by the executive attention theory from behavioral sciences, we hypothesize that the self-attention mechanism within Transformer-based models might be responsible for their working memory capacity limits. To test this hypothesis, we train vanilla decoder-only transformers to perform N-back tasks and find that attention scores gradually aggregate to the N-back positions over training, suggesting that the model masters the task by learning a strategy to pay attention to the relationship between the current position and the N-back position. Critically, we find that the total entropy of the attention score matrix increases as N increases, suggesting that the dispersion of attention scores might be the cause of the capacity limit observed in N-back tasks. Our findings thus offer insights into the shared role of attention in both human and artificial intelligence. Moreover, the limitations of the self-attention mechanism revealed in the current study could inform future efforts to design more powerful model architectures with enhanced working memory capacity and cognitive capabilities.

Self-Attention Limits Working Memory Capacity of Transformer-Based Models

TL;DR

Insight is offered into the shared role of attention in both human and artificial intelligence and the limitations of the self-attention mechanism revealed in the current study could inform future efforts to design more powerful model architectures with enhanced working memory capacity and cognitive capabilities.

Abstract

Recent work on Transformer-based large language models (LLMs) has revealed striking limits in their working memory capacity, similar to what has been found in human behavioral studies. Specifically, these models' performance drops significantly on N-back tasks as N increases. However, there is still a lack of mechanistic interpretability as to why this phenomenon would arise. Inspired by the executive attention theory from behavioral sciences, we hypothesize that the self-attention mechanism within Transformer-based models might be responsible for their working memory capacity limits. To test this hypothesis, we train vanilla decoder-only transformers to perform N-back tasks and find that attention scores gradually aggregate to the N-back positions over training, suggesting that the model masters the task by learning a strategy to pay attention to the relationship between the current position and the N-back position. Critically, we find that the total entropy of the attention score matrix increases as N increases, suggesting that the dispersion of attention scores might be the cause of the capacity limit observed in N-back tasks. Our findings thus offer insights into the shared role of attention in both human and artificial intelligence. Moreover, the limitations of the self-attention mechanism revealed in the current study could inform future efforts to design more powerful model architectures with enhanced working memory capacity and cognitive capabilities.
Paper Structure (11 sections, 2 equations, 12 figures, 1 table)

This paper contains 11 sections, 2 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: (a): N-back task schematic. Participants (humans or LLMs) are instructed to give a response (humans: press a button; LLMs: output "m") when the current letter is matched with the letter N step(s) ago, and withhold responses (humans: do nothing; LLMs: output "-") if it's a nonmatch. N is fixed for a given task sequence, and here we put $\{1, 2, 3\}$-back in the same schematic for illustration purposes only. (b): performance of GPT-3.5 and GPT-4 on this task, reproduced from results in gong2024working. Error bars represent $\pm1$ standard error of the mean.
  • Figure 2: (a): N-back task performance of Transformers with different number of decoder layers and attention heads per layer. (b): for the 1-layer 1-head Transformer model, task performance drops logarithmically as N increases. Error bars represent $\pm1$ standard error of the mean.
  • Figure 3: the model learns to attend target locations over training epochs. Here we show attention maps of a 1-layer 1-head Transformer model trained on the 3-back task as an example. See Appendix for attention maps in the 1-back and 2-back tasks.
  • Figure 4: (a)-(c): the relationship between test accuracy at position $i$ and the attention score at position $i - N$ for the 1-layer 1-head Transformer model. Different colors represent different training epochs each dot belongs to. (d)-(f): the relationship between test accuracy at position $i$ and the attention score at position $i - N$ for the 1-layer 1-head Transformer model, but here different colors indicate different positions in the task sequence.
  • Figure 5: $H_N$ increases as the test accuracy decreases with larger N. Error bars represent $\pm1$ standard error of the mean.
  • ...and 7 more figures