Table of Contents
Fetching ...

Modern Hopfield Networks with Continuous-Time Memories

Saul Santos, António Farinhas, Daniel C. McNamee, André F. T. Martins

TL;DR

The paper tackles scalable memory in modern Hopfield networks by introducing continuous-time memories that compress large discrete memories into smooth, continuous representations. It derives an energy-based CTM-HN with a Gibbs-density update over a continuous signal, linking attractor dynamics to neural resource allocation and continuous-attention concepts. Empirical results on synthetic tasks and video data show retrieval performance comparable to discrete HNs while using fewer memory resources, with notable gains in continuous embeddings and when memory length is large. This approach offers a principled, resource-efficient path toward scalable memory-augmented models and broadens the connection between Hopfield dynamics, continuous attention, and transformer-inspired architectures.

Abstract

Recent research has established a connection between modern Hopfield networks (HNs) and transformer attention heads, with guarantees of exponential storage capacity. However, these models still face challenges scaling storage efficiently. Inspired by psychological theories of continuous neural resource allocation in working memory, we propose an approach that compresses large discrete Hopfield memories into smaller, continuous-time memories. Leveraging continuous attention, our new energy function modifies the update rule of HNs, replacing the traditional softmax-based probability mass function with a probability density, over the continuous memory. This formulation aligns with modern perspectives on human executive function, offering a principled link between attractor dynamics in working memory and resource-efficient memory allocation. Our framework maintains competitive performance with HNs while leveraging a compressed memory, reducing computational costs across synthetic and video datasets.

Modern Hopfield Networks with Continuous-Time Memories

TL;DR

The paper tackles scalable memory in modern Hopfield networks by introducing continuous-time memories that compress large discrete memories into smooth, continuous representations. It derives an energy-based CTM-HN with a Gibbs-density update over a continuous signal, linking attractor dynamics to neural resource allocation and continuous-attention concepts. Empirical results on synthetic tasks and video data show retrieval performance comparable to discrete HNs while using fewer memory resources, with notable gains in continuous embeddings and when memory length is large. This approach offers a principled, resource-efficient path toward scalable memory-augmented models and broadens the connection between Hopfield dynamics, continuous attention, and transformer-inspired architectures.

Abstract

Recent research has established a connection between modern Hopfield networks (HNs) and transformer attention heads, with guarantees of exponential storage capacity. However, these models still face challenges scaling storage efficiently. Inspired by psychological theories of continuous neural resource allocation in working memory, we propose an approach that compresses large discrete Hopfield memories into smaller, continuous-time memories. Leveraging continuous attention, our new energy function modifies the update rule of HNs, replacing the traditional softmax-based probability mass function with a probability density, over the continuous memory. This formulation aligns with modern perspectives on human executive function, offering a principled link between attractor dynamics in working memory and resource-efficient memory allocation. Our framework maintains competitive performance with HNs while leveraging a compressed memory, reducing computational costs across synthetic and video datasets.

Paper Structure

This paper contains 11 sections, 1 theorem, 18 equations, 4 figures.

Key Result

proposition 1

Minimizing (eq:energy) using the CCCP algorithm yuille2003concave leads to the Gibbs expectation update, which is given by: where $p(t)$ is the Gibbs density with temperature $\beta^{-1}$ with the continuous query-key similarity $s(t) = (\bm{q}^{(i)})^\top \bm{\bar{x}}(t) = (\bm{q}^{(i)})^\top\bm{B}^\top \bm{\psi}(t)$.

Figures (4)

  • Figure 1: Optimization trajectories and energy contours for Hopfield networks with discrete (top) and continuous memories (bottom). Green illustrates the continuous function shaped by discrete memory points, while darker shades of blue indicate lower energy regions.
  • Figure 2: Video retrieval performance across different numbers of basis functions. Plotted are the cosine similarity means and standard deviations across videos.
  • Figure 3: Video embedding retrieval performance across different numbers of basis functions. Plotted are the cosine similarity means and standard deviations across videos.
  • Figure 4: Performance on video and embedding data across different numbers of sampling points used to approximate the integrals of our framework.

Theorems & Definitions (1)

  • proposition 1