Table of Contents
Fetching ...

Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu

TL;DR

Empirically, with real-world datasets, the proposed two-stage memory retrieval dynamics for modern Hopfield models outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks.

Abstract

We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $Φ$ which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by $Φ$ serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_Φ$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, $\mathtt{U\text{-}Hop}$ memory retrieval process consists of: (Stage I) minimizing separation loss for a more uniform memory (local minimum) distribution, followed by (Stage II) standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible metastable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that $\mathtt{U\text{-}Hop}$ outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks. Code is available at https://github.com/MAGICS-LAB/UHop ; future updates are on arXiv:2404.03827

Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

TL;DR

Empirically, with real-world datasets, the proposed two-stage memory retrieval dynamics for modern Hopfield models outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks.

Abstract

We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed , with enhanced memory capacity. Our key contribution is a learnable feature map which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, memory retrieval process consists of: (Stage I) minimizing separation loss for a more uniform memory (local minimum) distribution, followed by (Stage II) standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible metastable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks. Code is available at https://github.com/MAGICS-LAB/UHop ; future updates are on arXiv:2404.03827
Paper Structure (64 sections, 8 theorems, 46 equations, 49 figures, 8 tables, 2 algorithms)

This paper contains 64 sections, 8 theorems, 46 equations, 49 figures, 8 tables, 2 algorithms.

Key Result

Theorem 2.1

With asum:non-sing, the energy function $E(\mathbf{x})$ was monotonically decreased by the following retrieval dynamics: where $\text{Sep}_{\alpha=1}$⋅$=\text{Softmax}$⋅$$, $\text{Sep}_{\alpha=2}$⋅$= \text{Sparsemax}$⋅$$ and $\text{Sep}_{\alpha\in[1,2]}$⋅$= \alpha\text{-EntMax}$⋅$$.

Figures (49)

  • Figure 1: Separation Loss over Memory Set v.s. Retrieval Error. We perform 200 runs of memory retrieval with $\mathtt{U\text{-}Hop}$ on MNIST. The result shows a strong correlation between low separation loss and low retrieval error.
  • Figure 2: Visualization of $\mathtt{U\text{-}Hop}$: Separation Maximization First, then Memory Retrieval Dynamics. The LHS represents the energy landscape in original state space, where the memories stay close to each other. With separation loss minimization, we obtain a $\Phi$ parameterized by $\mathbf{W}^\star$, that is able to relocate memory patterns in the kernel space to more uniform locations, and thus results in the separation between local minima of $E_\mathcal{K}$.
  • Figure 3: Memory Retrieval Error Comparison (\ref{['seec:exp_memory']}: Memory Capacity & Noise Robustness). We conduct memory retrieval experiments on the MNIST and CIFAR10 datasets. For the "Memory Set Size v.s. Error" plots, we vary the memory set size for retrieval. For the "Noise Level v.s. Error" plots, we randomly sample Gaussian noise and rescale the norm of the noise w.r.t. different noise levels. All four plots show U-Hop retrieved patterns with significantly less error compared to all existing Hopfield models across all sizes of memory and noise levels.
  • Figure 4: Model Convergence Comparison with and without $\mathtt{U\text{-}Hop}$ on CIFAR100 (\ref{['sec:exp_sl']}: Image Classification Task). Left to right: Training Accuracy, Test Accuracy, Training Loss and Test Loss. Yellow and green curves represent modern Hopfield + $\mathtt{U\text{-}Hop}$ and Sparse modern Hopfield + $\mathtt{U\text{-}Hop}$. Blue and red curves represent modern Hopfield and Sparse modern Hopfield. The result demonstrates without $\mathtt{U\text{-}Hop}$, Hopfield layers fall into the low-rank bottleneck bhojanapalli2020low despite of high embedding dimension. On the other hand, $\mathtt{U\text{-}Hop}$ successfully avoid such issue and thus have better training accuracy. For generalization power and convergence speed, $\mathtt{U\text{-}Hop}$ also outperforms other baselines by a large margin. For other datasets and sample size, we leave the results in \ref{['sec:additional-exp']}.
  • Figure 5: Retrieval Error v.s. Separation-Maximization (Stage I of \ref{['algorithm1']}) Iteration $N$ (\ref{['seec:exp_memory']}). We vary the iteration number $N$ and perform memory retrieval on $\mathtt{U\text{-}Hop}$$\;$ with modern Hopfield. We set $\beta=1, t=2$ and report the sum-of-square pixel differences. The result shows the retrieval error decays fast with respect to the increase of $N$.
  • ...and 44 more figures

Theorems & Definitions (24)

  • Definition 1.1: Stored and Retrieved
  • Remark 2.1
  • Theorem 2.1: Retrieval Dynamics
  • proof : Proof Sketch
  • Lemma 2.1: Convergence on retrieval dynamics $\mathcal{T}_\mathcal{K}$
  • proof : Proof Sketch
  • Definition 2.1: Pattern Stored and Retrieved
  • Definition 2.2: Average Separation Loss
  • Theorem 2.2
  • proof
  • ...and 14 more