Neural Computation Without Slots: Steps Towards Biologically Plausible Memory and Attention in Natural and Artificial Intelligence
Shaunak Bhandarkar, James L. McClelland
TL;DR
This work argues for memory and attention without explicit slots by extending modern Hopfield networks to a fixed-capacity, sparse, distributed memory (the K-winner MHN). It then demonstrates how MHN-based components can approximate slot-based attention in a minimal transformer by storing keys and values in fast, connection-weighted memories and learning slow representations for queries and values. Across unstructured and structured memory patterns, the K-winner MHN shows enhanced retention of older memories with only modest costs to initial retrieval, and in the transformer setting the QK-MHN-Transformer and related variants can achieve perfect in-context learning on a Case Sequence Task, with semantically meaningful structuring emerging in the learned weight matrices. The results provide a principled, biologically plausible bridge between slot-based AI mechanisms and distributed, weight-based memory, with implications for memory, attention, and continual learning in both brains and AI systems.
Abstract
Many models used in artificial intelligence and cognitive science rely on multi-element patterns stored in "slots" - dedicated storage locations - in a digital computer. As biological brains likely lack slots, we consider how they might achieve similar functional outcomes without them by building on the neurally-inspired modern Hopfield network (MHN; Krotov & Hopfield, 2021), which stores patterns in the connection weights of an individual neuron. We propose extensions of this approach to increase its biological plausibility as a model of memory and to capture an important advantage of slot-based computation in contemporary language models. For memory, neuroscience research suggests that the weights of overlapping sparse ensembles of neurons, rather than a dedicated individual neuron, are used to store a memory. We introduce the K-winner MHN, extending the approach to ensembles, and find that within a continual learning regime, the ensemble-based MHN exhibits greater retention of older memories, as measured by the graded sensitivity measure d', than a standard (one-neuron) MHN. Next, we consider the powerful use of slot-based memory in contemporary language models. These models use slots to store long sequences of past inputs and their learned encodings, supporting later predictions and allowing error signals to be transported backward in time to adjust weights underlying the learned encodings of these past inputs. Inspired by these models' successes, we show how the MHN can be extended to capture both of these important functional outcomes. Collectively, our modeling approaches constitute steps towards understanding how biologically plausible mechanisms can support computations that have enabled AI systems to capture human-like abilities that no prior models have been able to achieve.
