Table of Contents
Fetching ...

Emerging Human-like Strategies for Semantic Memory Foraging in Large Language Models

Eric Lacosse, Mariana Duarte, Peter M. Todd, Daniel C. McNamee

TL;DR

It is shown that these same behavioral signatures, critical to human performance on the SFT, also emerge as identifiable patterns in LLMs across distinct layers, providing new insights into how LLMs may be adapted into closer cognitive alignment with humans, or alternatively, guided toward productive cognitive alignment to enhance complementary strengths in human-AI interaction.

Abstract

Both humans and Large Language Models (LLMs) store a vast repository of semantic memories. In humans, efficient and strategic access to this memory store is a critical foundation for a variety of cognitive functions. Such access has long been a focus of psychology and the computational mechanisms behind it are now well characterized. Much of this understanding has been gleaned from a widely-used neuropsychological and cognitive science assessment called the Semantic Fluency Task (SFT), which requires the generation of as many semantically constrained concepts as possible. Our goal is to apply mechanistic interpretability techniques to bring greater rigor to the study of semantic memory foraging in LLMs. To this end, we present preliminary results examining SFT as a case study. A central focus is on convergent and divergent patterns of generative memory search, which in humans play complementary strategic roles in efficient memory foraging. We show that these same behavioral signatures, critical to human performance on the SFT, also emerge as identifiable patterns in LLMs across distinct layers. Potentially, this analysis provides new insights into how LLMs may be adapted into closer cognitive alignment with humans, or alternatively, guided toward productive cognitive \emph{disalignment} to enhance complementary strengths in human-AI interaction.

Emerging Human-like Strategies for Semantic Memory Foraging in Large Language Models

TL;DR

It is shown that these same behavioral signatures, critical to human performance on the SFT, also emerge as identifiable patterns in LLMs across distinct layers, providing new insights into how LLMs may be adapted into closer cognitive alignment with humans, or alternatively, guided toward productive cognitive alignment to enhance complementary strengths in human-AI interaction.

Abstract

Both humans and Large Language Models (LLMs) store a vast repository of semantic memories. In humans, efficient and strategic access to this memory store is a critical foundation for a variety of cognitive functions. Such access has long been a focus of psychology and the computational mechanisms behind it are now well characterized. Much of this understanding has been gleaned from a widely-used neuropsychological and cognitive science assessment called the Semantic Fluency Task (SFT), which requires the generation of as many semantically constrained concepts as possible. Our goal is to apply mechanistic interpretability techniques to bring greater rigor to the study of semantic memory foraging in LLMs. To this end, we present preliminary results examining SFT as a case study. A central focus is on convergent and divergent patterns of generative memory search, which in humans play complementary strategic roles in efficient memory foraging. We show that these same behavioral signatures, critical to human performance on the SFT, also emerge as identifiable patterns in LLMs across distinct layers. Potentially, this analysis provides new insights into how LLMs may be adapted into closer cognitive alignment with humans, or alternatively, guided toward productive cognitive \emph{disalignment} to enhance complementary strengths in human-AI interaction.
Paper Structure (12 sections, 5 figures)

This paper contains 12 sections, 5 figures.

Figures (5)

  • Figure 1: LLM and Human generated SFT sequences are compared. A. A state-transition diagram is drawn as conceptual illustration of three categories (non-human primates, insects, farm animals) that describe how a sequence is thought to be generated. Arrows between the nodes represent divergent behavior whereas self-edges represent convergent. Clustering refers to generating words within a specific category, while switching involves moving to a new category. B. The correlation between the average state-transition matrix representing transition probabilities between categories for human and LLM. C. LLM and human between-category and within-category transition probability distributions compared. D. Switch ratio distributions of human and LLM sequences showing that humans switch more often (mean switch ratio 0.55) than LLMs (mean switch ratio 0.4) indicating that LLMs are more effective at exhaustively sampling semantic clusters before switching. Mann-Whitney $p < 0.0001$.
  • Figure 2: Explaining switching or convergent/divergent behavior in LLMs. A. Schematic illustrating the analysis of a switch event. For the preceding context, the model's vocabulary is partitioned into a Within-category token set (tokens that would continue the current 'pet' cluster) and a Between-category token set (tokens that would initiate a switch to a new cluster) via animal category norms zemla_snafu_2020B. Final output probabilities (z-scored) centered on the switch event (Relative Position 0). At the switch, the probability mass for Between-category tokens (orange) peaks, while the mass for Within-category tokens (blue) troughs. C. Logitlens analysis comparing probabilities during switch events (solid lines) vs. non-switch, convergent events (dashed lines). A clear separation emerges in the mid-to-late layers beyond layer 40. Critically, during a switch event, the model's tendency to amplify Within-category probabilities are greatly attenuated (compare solid blue vs. dashed blue line), providing a distinct, internal computational signature for the divergent "switch" mechanism. D. Average probability of Within-category vs. Between-category for switch vs. non-switch events from layer $>$ 39 where token probabilities in the different categories examined begin to differ from 0, as shown in panel C.
  • Figure 3: Decodability of Semantic Foraging Behavior from Internal Representations. The heatmap displays the layer-wise classification performance (AUROC) of linear probes trained to distinguish switch (divergent) vs. non-switch (convergent) events across four Llama models of increasing scale (1B, 3B, 8B, 70B) and a dataset of three different prompt types: N-neutral, C-convergent, and D-divergent. A summary of classifier performance for the averaged top-3 layers of the representation reading of the three different datasets demonstrates that increasing model size tends to improve performance.
  • Figure S1: Investigating whether linear probing allows accurate readouts of human generated sequences from the human dataset, method akin to Figure \ref{['fig:3']}. The highest accuracies were achieved in Llama-3.1-8-A within layer 14, $\textsc{AUROC}=0.60$
  • Figure S2: Classification of switching behavior from NLL output estimates for all model sizes for the actual token sequences of the human examined in Figure \ref{['fig:2']}. Larger models demonstrate higher accuracy.