Table of Contents
Fetching ...

Dense Associative Memory with Epanechnikov Energy

Benjamin Hoover, Zhaoyang Shi, Krishnakumar Balasubramanian, Dmitry Krotov, Parikshit Ram

TL;DR

This work addresses the trade-off between memorization and generalization in Dense Associative Memories by introducing a KDE-inspired energy based on the Epanechnikov kernel, dubbed log-sum-ReLU (LSR). The LSR energy $E^{\text{LSR}}_\beta(\mathbf{x};\boldsymbol{\Xi})=-\frac{1}{\beta}\log\big(\epsilon+\sum_{\mu=1}^M \operatorname{ReLU}(1-\tfrac{\beta}{2}\|\mathbf{x}-\boldsymbol{\xi}_\mu\|^2)\big)$ enables exact retrieval of exponentially many memories and, crucially, the emergence of many novel local minima (emergent memories) without sacrificing recall. The paper provides theoretical guarantees (retrieval and emergent-memory counts) and validates the approach through synthetic landscapes and real-data latent spaces, showing that emergent memories can be plausibly meaningful and diverse, with log-likelihood comparable to LSE-based methods. This points to a new class of memory-rich, generative DenseAMs with potential for large-scale storage and latent-space generation, while also outlining practical limitations and future directions such as hybrid energy formulations and kernel-family exploration.

Abstract

We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional \emph{emergent} local minima while preserving perfect pattern recovery -- a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR's emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method's potential for both large-scale memory storage and generative tasks.

Dense Associative Memory with Epanechnikov Energy

TL;DR

This work addresses the trade-off between memorization and generalization in Dense Associative Memories by introducing a KDE-inspired energy based on the Epanechnikov kernel, dubbed log-sum-ReLU (LSR). The LSR energy enables exact retrieval of exponentially many memories and, crucially, the emergence of many novel local minima (emergent memories) without sacrificing recall. The paper provides theoretical guarantees (retrieval and emergent-memory counts) and validates the approach through synthetic landscapes and real-data latent spaces, showing that emergent memories can be plausibly meaningful and diverse, with log-likelihood comparable to LSE-based methods. This points to a new class of memory-rich, generative DenseAMs with potential for large-scale storage and latent-space generation, while also outlining practical limitations and future directions such as hybrid energy formulations and kernel-family exploration.

Abstract

We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional \emph{emergent} local minima while preserving perfect pattern recovery -- a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR's emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method's potential for both large-scale memory storage and generative tasks.

Paper Structure

This paper contains 31 sections, 5 theorems, 35 equations, 10 figures, 3 algorithms.

Key Result

Theorem 1

Let $r = \min_{\mu, \nu \in \llbracket M \rrbracket, \mu \not= \nu} \left\lVert\boldsymbol{\xi}_\mu - \boldsymbol{\xi}_\nu\right\rVert$ be the minimum Euclidean distance between any two memories. Let $S_\mu(\Delta) = \{ \mathbf{x} \in {\mathcal{X}} : \left\lVert\mathbf{x} - \boldsymbol{\xi}_\mu\righ

Figures (10)

  • Figure 1: LSR energy can create more memories than there are stored patterns under critical regimes of $\beta$. Left: 1D LSR vs LSE energy landscape. Note that LSE is never capable of having more local minima than the number of stored patterns. Right: 2D LSR energy landscape, where increasing $\beta$ creates novel local minima where basins intersect. Unsupported regions are shaded gray.
  • Figure 2: Visualizing the separation functions $F(\beta x) = \exp(\beta x)$ (LSE) and $F(\beta x) = \operatorname{ReLU}\left(1+ \beta x\right)$ (LSR) with $x = S(\mathbf{x}, \mathbf{x}')$ for varying values of $\beta$. We focus on $S(\mathbf{x}, \mathbf{x}') = -1/2 \| \mathbf{x} - \mathbf{x}' \|^2$.
  • Figure 3: (Left) Analyzing local minima in LSR energy reveals a number of novel memories several orders of magnitude larger than $M$, the number of stored patterns, at critical values of $\beta$ (note that the y-axes are logscale). These emergent memories occur even while still preserving the stored patterns as memories. Smaller values of $\beta$ have a larger region of support on the unit hypercube. (Right) Given samples from some known true density function (in this case, a $k=10$ mixture of 8-dim Gaussians with means drawn uniformly from the unit hypercube and $\sigma=0.1$), memories from LSR energy have a log-likelihood comparable to, and occasionally slightly higher than, LSE under the true density function. Note that LSR achieves comparable log-likelihood while having more unique samples than LSE, even when both are seeded with the same $N=500$ queries. Regions of $\beta$ where LSR outperforms LSE on a metric are specified by the orange regions. Error bars indicate the standard error across 5 different seeds for sampling stored patterns and initial queries.
  • Figure 4: LSR's emergent memories appear as novel, creative generations when the energy is applied to a semantically meaningful latent space. (Left) 24 randomly-selected MNIST images are encoded into 10-dim VAE latents and stored into an LSR- and LSE-energy using a carefully chosen $\beta$ (see \ref{['alg:beta-search']}). Gray boxes indicate which stored patterns were not preserved at the chosen $\beta$. (Right) 40 TinyImagenet Le2015TinyIV images are encoded into 256-dim latents using a pretrained VAE madebyollin2023taesd and stored into an LSR- and LSE-energy using a carefully chosen $\beta$. Note that in this TinyImagenet example the LSR energy is, by definition, globally emergent since all stored patterns are recoverable, while the MNIST example is not. See experiment details in \ref{['sec:details-qualitative-reconstructions']}.
  • Figure 5: Different kernels used in KDE with their expression and KDE efficiency relative to the Epanechnikov kernel ( higher is better, see text for details). The center of each kernel is marked with a red $\star$. To highlight the shape of the kernel, we have removed any scaling in the kernel expression. Note that all above kernels except Gaussian have finite support. The Epanechnikov kernel has the highest efficiency (100%). While the Gaussian kernel is extremely popular, and it is more efficient (95.1%) than the Uniform kernel (92.9%), there are various other kernels with better efficiency.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Remark 1
  • Definition 1: Novel local minima
  • Definition 2: Global emergence
  • Proposition 1
  • Definition 3: Locally emergent memory
  • Proposition 2
  • Theorem 2
  • Proposition 3
  • proof