Table of Contents
Fetching ...

On the Expressive Power of Modern Hopfield Networks

Xiaoyu Li, Yuanpeng Li, Yingyu Liang, Zhenmei Shi, Zhao Song

TL;DR

The paper analyzes the expressive power of modern and kernelized Hopfield networks through circuit complexity, proving that poly(n)-precision MHNs with constant depth and O(n) hidden units reside in DLOGTIME-uniform TC0. Consequently, unless TC0 = NC1, such MHNs cannot solve NC1-hard problems like undirected connectivity and tree isomorphism. Kernelized MHNs receive analogous bounds, indicating inherent expressivity limits despite empirical success. These results guide the design of Hopfield-based architectures by clarifying fundamental computational constraints and highlighting gaps between practical performance and worst-case complexity.

Abstract

Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms. Recent advancements have enhanced their storage capacity, retrieval speed, and error rates. However, the fundamental limits of their computational expressiveness remain unexplored. Understanding the expressive power of MHNs is crucial for optimizing their integration into deep learning architectures. In this work, we establish rigorous theoretical bounds on the computational capabilities of MHNs using circuit complexity theory. Our key contribution is that we show that MHNs are $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$. Hence, unless $\mathsf{TC}^0 = \mathsf{NC}^1$, a $\mathrm{poly}(n)$-precision modern Hopfield networks with a constant number of layers and $O(n)$ hidden dimension cannot solve $\mathsf{NC}^1$-hard problems such as the undirected graph connectivity problem and the tree isomorphism problem. We also extended our results to Kernelized Hopfield Networks. These results demonstrate the limitation in the expressive power of the modern Hopfield networks. Moreover, Our theoretical analysis provides insights to guide the development of new Hopfield-based architectures.

On the Expressive Power of Modern Hopfield Networks

TL;DR

The paper analyzes the expressive power of modern and kernelized Hopfield networks through circuit complexity, proving that poly(n)-precision MHNs with constant depth and O(n) hidden units reside in DLOGTIME-uniform TC0. Consequently, unless TC0 = NC1, such MHNs cannot solve NC1-hard problems like undirected connectivity and tree isomorphism. Kernelized MHNs receive analogous bounds, indicating inherent expressivity limits despite empirical success. These results guide the design of Hopfield-based architectures by clarifying fundamental computational constraints and highlighting gaps between practical performance and worst-case complexity.

Abstract

Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms. Recent advancements have enhanced their storage capacity, retrieval speed, and error rates. However, the fundamental limits of their computational expressiveness remain unexplored. Understanding the expressive power of MHNs is crucial for optimizing their integration into deep learning architectures. In this work, we establish rigorous theoretical bounds on the computational capabilities of MHNs using circuit complexity theory. Our key contribution is that we show that MHNs are -uniform . Hence, unless , a -precision modern Hopfield networks with a constant number of layers and hidden dimension cannot solve -hard problems such as the undirected graph connectivity problem and the tree isomorphism problem. We also extended our results to Kernelized Hopfield Networks. These results demonstrate the limitation in the expressive power of the modern Hopfield networks. Moreover, Our theoretical analysis provides insights to guide the development of new Hopfield-based architectures.

Paper Structure

This paper contains 41 sections, 26 theorems, 29 equations, 1 algorithm.

Key Result

Lemma 4.1

Let $p \leq \mathop{\mathrm{poly}}\nolimits(n)$, $n_1, n_2 \leq \mathop{\mathrm{poly}}\nolimits(n)$, and $d \leq n$. Let $A \in \mathbb{F}_p^{n_1 \times d}$ and $B \in \mathbb{F}_p^{d \times n_2}$. Then the matrix product $AB$ can be implemented using a $\mathsf{DLOGTIME}$-uniform threshold circuit

Theorems & Definitions (60)

  • Definition 3.1: Hopfield attention matrix, rsl+21
  • Definition 3.2: Hopfield layer, page 6 in rsl+21
  • Definition 3.3: Multi-layer Modern Hopfield Networks
  • Definition 3.4: Two-layer ReLU Feed-forward Neural Networks
  • Definition 3.5: Kernelized attention matrix
  • Definition 3.6: Single kernelized Hopfield layer
  • Definition 3.7: Kernelized Hopfield network
  • Lemma 4.1: Matrix multiplication in $\mathsf{TC}^0$, Lemma 4.2 in cll+24
  • Lemma 4.2: Computation of Hopfield attention matrix in $\mathsf{TC}^0$
  • proof
  • ...and 50 more