Table of Contents
Fetching ...

Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences

Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, James L. McClelland

TL;DR

The paper argues that latent learning—learning information not immediately relevant to the current task—is a key gap between natural and artificial intelligence. It formalizes a framework for latent learning, demonstrates that parametric learning alone struggles to leverage latent information, and shows that oracle episodic retrieval can improve generalization across reversal, code, semantic, and navigation tasks. Through a suite of benchmarks, it highlights the importance of within-episode in-context learning for effectively using retrieved experiences and discusses how retrieval-based mechanisms complement traditional learning. The findings connect cognitive neuroscience with AI practice, suggesting retrieval and episodic memory as crucial tools for more flexible, data-efficient generalization in real-world tasks.

Abstract

When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of parametric machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization. We close by discussing some of the links between these findings and prior results in cognitive science and neuroscience, and the broader implications.

Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences

TL;DR

The paper argues that latent learning—learning information not immediately relevant to the current task—is a key gap between natural and artificial intelligence. It formalizes a framework for latent learning, demonstrates that parametric learning alone struggles to leverage latent information, and shows that oracle episodic retrieval can improve generalization across reversal, code, semantic, and navigation tasks. Through a suite of benchmarks, it highlights the importance of within-episode in-context learning for effectively using retrieved experiences and discusses how retrieval-based mechanisms complement traditional learning. The findings connect cognitive neuroscience with AI practice, suggesting retrieval and episodic memory as crucial tools for more flexible, data-efficient generalization in real-world tasks.

Abstract

When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of parametric machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization. We close by discussing some of the links between these findings and prior results in cognitive science and neuroscience, and the broader implications.

Paper Structure

This paper contains 29 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Conceptual overview of the challenges of using latent information from training experiences, and how retrieval complements parametric learning to overcome them. (\ref{['fig:overview:reversal']}) The reversal curse berglund2024reversal is an example of how parametric learners, such as language models, consolidate information in ways that depend on the learning task and format. Models that learn a relation in one format can answer queries compatible with the learning format, but not those that reverse the relation---even though the reversed relation is latently implied by the forward one, and the models are fully capable of reversing relations to make inferences in context lampinen2025generalization. (\ref{['fig:overview:explicit_latent']}) Challenges of reversal are one instance of the much broader phenomenon that what is explicitly learned may also latently convey information relevant to other tasks---e.g., multi-hop reasoning, alternative goals, or answering questions in other languages. Like the reversal curse, learning on such sequences may primarily improve performance on the explicit information or goals; however, if the sequence were in context, models would readily be able to make inferences about the latent information. (\ref{['fig:overview:complementary']}) Therefore, explicit retrieval of specific experiences from episodic memory complements the broader knowledge of parametric learning---by making select, relevant experiences available in context where the latent information they contain can be used more flexibly, in ways different from the original task setting in which they were encountered. (\ref{['fig:overview:performance']}) Thus, we will typically expect systems with solely parametric learning to perform well at new tests of knowledge that is explicit in learning experiences, but we expect selective performance advantages for episodic retrieval in tests of knowledge or tasks that are latent in learning experiences---as we demonstrate below. (These illustrative results are adapted from the simple reversals experiments below.)
  • Figure 2: The benchmarks we use and the key types of latent generalization that they test. (\ref{['fig:benchmarks:codebooks']}) The codebooks benchmark tests the ability to use latent indices (highlighted in red) for which only the definitions have been seen in training to complete test encoding sequences. (\ref{['fig:benchmarks:simple_reversals']}) The simple reversals benchmark tests the ability of models to reverse relations seen in training, and which models have learned to reverse in-context. (\ref{['fig:benchmarks:semantic_structure']}) The semantic structure benchmark uses training embedded in more naturalistic text to test latent generalization types ranging from reversals to syllogisms, or more challenging category-inclusion-only holdouts. (\ref{['fig:benchmarks:gridworld']}) The latent gridworld---with both its pixel-based RL and ASCII-based BC instantiations---tests the ability to navigate to objects that have never been a navigation goal in training for a particular maze, but have been frequently seen. (The same maze is shown in both pixels and ASCII; the agent's view window is shown with a dashed line for clarity.)
  • Figure 3: The inability to latent learn demonstrates the potential benefit of episodic memory---models often contain the information they need to solve a task, and can solve the task if that information is in context, but cannot put the pieces together to achieve latent learning. In each plot, the right-most bar (with close to zero performance) is the latent test; the other bars are the pieces needed to solve it: the ability to recall the relevant information (blue bars) and the ability to use the relevant information to solve the task in context (yellow bars). (Errorbars are 95%-CIs calculated across 4 runs.)
  • Figure 4: The benefits of oracle retrieval on the (\ref{['fig:results:retrieval:codebooks']}) codebooks and (\ref{['fig:results:retrieval:simple_reversals']}) simple reversals benchmarks. Both baseline and retrieval models perform well on component tasks like recalling definitions, or encoding new sequences involving indices used in encoding during training (\ref{['fig:results:retrieval:codebooks']}, center). However, performance differs dramatically on the latent encoding test (right bars on both plots), where only the model with retrieval achieves above-chance performance. (Errorbars are 95%-CIs calculated across 4 runs.
  • Figure 5: Results on the semantic structure benchmark, comparing performance of a baseline model and one with oracle retrieval, in settings with and without strong associative cues. (\ref{['fig:results:semantic_structure:assc']}) When strong similarity-based cues are present in the data, both models achieve relatively high performance due to the possibility of associative learning. This demonstrates how similarity-based generalization can provide an alternative route to generalization in some cases. (\ref{['fig:results:semantic_structure:reduced']}) When similarity-based cues are reduced, the advantage of the retrieval model is more notable. However, in all cases the benefits of retrieval are more muted than in other cases, likely because there are not sufficient examples for the model to acquire strong ICL capabilities.
  • ...and 6 more figures