Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, James L. McClelland
TL;DR
The paper argues that latent learning—learning information not immediately relevant to the current task—is a key gap between natural and artificial intelligence. It formalizes a framework for latent learning, demonstrates that parametric learning alone struggles to leverage latent information, and shows that oracle episodic retrieval can improve generalization across reversal, code, semantic, and navigation tasks. Through a suite of benchmarks, it highlights the importance of within-episode in-context learning for effectively using retrieved experiences and discusses how retrieval-based mechanisms complement traditional learning. The findings connect cognitive neuroscience with AI practice, suggesting retrieval and episodic memory as crucial tools for more flexible, data-efficient generalization in real-world tasks.
Abstract
When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of parametric machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization. We close by discussing some of the links between these findings and prior results in cognitive science and neuroscience, and the broader implications.
