Hierarchical Associative Memory
Dmitry Krotov
TL;DR
This work generalizes Modern Hopfield Networks to fully recurrent, multi-layer architectures with local connectivity by leveraging a Lagrangian formalism that yields a global Lyapunov energy $E$ decreasing along trajectories. It introduces hierarchical layered HAMs with symmetric feedforward/feedback weights, enabling bottom-up and top-down information flow and ensuring convergence to fixed-point attractors. Three simple architectures are worked out (one hidden layer, two dense hidden layers, and two hidden layers with a convolutional first layer), each accompanied by explicit dynamics and energy expressions, including adiabatic time-scale simplifications. The approach broadens memory capacity and inductive bias while maintaining biological plausibility, offering pathways for end-to-end or time-unfolded training and highlighting potential extensions like lateral connections and gated units.
Abstract
Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer, and have only been formulated with densely connected network architectures, two aspects that hinder their machine learning applications. This paper tackles this gap and describes a fully recurrent model of associative memory with an arbitrary large number of layers, some of which can be locally connected (convolutional), and a corresponding energy function that decreases on the dynamical trajectory of the neurons' activations. The memories of the full network are dynamically "assembled" using primitives encoded in the synaptic weights of the lower layers, with the "assembling rules" encoded in the synaptic weights of the higher layers. In addition to the bottom-up propagation of information, typical of commonly used feedforward neural networks, the model described has rich top-down feedback from higher layers that help the lower-layer neurons to decide on their response to the input stimuli.
