Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training
Deven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal, Sahaj Singh Maini, Zoran Tiganj
TL;DR
The paper probes how transformer networks organize temporal information during in-context learning by applying cognitive science metrics—lag-CRP analysis, induction matching scores, and controlled ablations—to GPT-2 small and medium trained on WikiText-103 and FineWeb datasets. It reveals episodic-memory-like temporal biases in attention, including primacy, recency, and contiguity, with contiguity predominantly driven by induction heads that enable in-context sequence recall; these effects weaken when induction heads are ablated. Time constants governing temporal retrieval are typically short, clustering around 2–4 tokens, and the magnitude of positional encodings modulates the strength and shape of these effects. Collectively, the work provides a quantitative, cross-disciplinary view of how temporal context is organized during in-context learning in transformers and highlights the role of induction heads in shaping downstream recall behavior.
Abstract
We investigate in-context temporal biases in attention heads and transformer outputs. Using cognitive science methodologies, we analyze attention scores and outputs of the GPT-2 models of varying sizes. Across attention heads, we observe effects characteristic of human episodic memory, including temporal contiguity, primacy and recency. Transformer outputs demonstrate a tendency toward in-context serial recall. Importantly, this effect is eliminated after the ablation of the induction heads, which are the driving force behind the contiguity effect. Our findings offer insights into how transformers organize information temporally during in-context learning, shedding light on their similarities and differences with human memory and learning.
