Table of Contents
Fetching ...

Dense Associative Memories with Analog Circuits

Marc Gong Bacvanski, Xincheng You, John Hopfield, Dmitry Krotov

TL;DR

The paper tackles the energy and latency bottlenecks of digital AI inference by proposing Dense Associative Memories (DenseAM) as an energy-based, continuous-time computation framework implemented on analog hardware. It introduces a full hardware design using RC circuits and resistive crossbars to realize DenseAM dynamics, achieving constant-time inference largely independent of model size. Through experiments on XOR, Hamming(7,4), and a parity/energy-transformer-inspired autoregressive task, it analyzes how inference time, energy, and hardware area scale, showing favorable linear energy scaling and practical latency bounds within CMOS technology. The work suggests a compelling co-design path for future AI accelerators where stable attractor dynamics and global energy minimization underpin fast, scalable inference across memory-centric and transformer-like architectures.

Abstract

The increasing computational demands of modern AI systems have exposed fundamental limitations of digital hardware, driving interest in alternative paradigms for efficient large-scale inference. Dense Associative Memory (DenseAM) is a family of models that offers a flexible framework for representing many contemporary neural architectures, such as transformers and diffusion models, by casting them as dynamical systems evolving on an energy landscape. In this work, we propose a general method for building analog accelerators for DenseAMs and implementing them using electronic RC circuits, crossbar arrays, and amplifiers. We find that our analog DenseAM hardware performs inference in constant time independent of model size. This result highlights an asymptotic advantage of analog DenseAMs over digital numerical solvers that scale at least linearly with the model size. We consider three settings of progressively increasing complexity: XOR, the Hamming (7,4) code, and a simple language model defined on binary variables. We propose analog implementations of these three models and analyze the scaling of inference time, energy consumption, and hardware. Finally, we estimate lower bounds on the achievable time constants imposed by amplifier specifications, suggesting that even conservative existing analog technology can enable inference times on the order of tens to hundreds of nanoseconds. By harnessing the intrinsic parallelism and continuous-time operation of analog circuits, our DenseAM-based accelerator design offers a new avenue for fast and scalable AI hardware.

Dense Associative Memories with Analog Circuits

TL;DR

The paper tackles the energy and latency bottlenecks of digital AI inference by proposing Dense Associative Memories (DenseAM) as an energy-based, continuous-time computation framework implemented on analog hardware. It introduces a full hardware design using RC circuits and resistive crossbars to realize DenseAM dynamics, achieving constant-time inference largely independent of model size. Through experiments on XOR, Hamming(7,4), and a parity/energy-transformer-inspired autoregressive task, it analyzes how inference time, energy, and hardware area scale, showing favorable linear energy scaling and practical latency bounds within CMOS technology. The work suggests a compelling co-design path for future AI accelerators where stable attractor dynamics and global energy minimization underpin fast, scalable inference across memory-centric and transformer-like architectures.

Abstract

The increasing computational demands of modern AI systems have exposed fundamental limitations of digital hardware, driving interest in alternative paradigms for efficient large-scale inference. Dense Associative Memory (DenseAM) is a family of models that offers a flexible framework for representing many contemporary neural architectures, such as transformers and diffusion models, by casting them as dynamical systems evolving on an energy landscape. In this work, we propose a general method for building analog accelerators for DenseAMs and implementing them using electronic RC circuits, crossbar arrays, and amplifiers. We find that our analog DenseAM hardware performs inference in constant time independent of model size. This result highlights an asymptotic advantage of analog DenseAMs over digital numerical solvers that scale at least linearly with the model size. We consider three settings of progressively increasing complexity: XOR, the Hamming (7,4) code, and a simple language model defined on binary variables. We propose analog implementations of these three models and analyze the scaling of inference time, energy consumption, and hardware. Finally, we estimate lower bounds on the achievable time constants imposed by amplifier specifications, suggesting that even conservative existing analog technology can enable inference times on the order of tens to hundreds of nanoseconds. By harnessing the intrinsic parallelism and continuous-time operation of analog circuits, our DenseAM-based accelerator design offers a new avenue for fast and scalable AI hardware.

Paper Structure

This paper contains 68 sections, 93 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Top left: Bipartite neural network formulation, where hidden neurons $h_\mu$ and visible neurons $v_i$ are connected via symmetric synaptic weights $\boldsymbol \xi$. Top right: Circuit realization of symmetric weight matrix via resistive crossbar array. Each crosspoint encodes a weight $\xi_{\mu i}$ by its resistance $R_{\mu i}=1/\xi_{\mu i}$. Lower right: Circuit schematic of a single hidden neuron. It drives its row of the crossbar array with a voltage according to its activation $f_\mu$, and its internal dynamics are driven by the incoming current flowing into it from the crossbar array. Lower left: Softmax activation function built from bipolar junction transistors (some components not shown).
  • Figure 2: Solving XOR with a DenseAM. Visible neuron $g_3=v_3$ serves as the output, while the two input neurons (unlabeled, thin lines) are clamped at $1$ and $0$ for True and False. Output $v_3$ is initialized at $0.5$ and converges to a positive prediction of $1$. The activation of the hidden neuron $f_3$ for the truth-table row (1, 0, 1) becomes highly activated, with others (fine lines) are suppressed by softmax. Energy (\ref{['global energy function']}), or equivalently (\ref{['eq: effective energy function']}), decreases monotonically along the inference trajectory.
  • Figure 3: XOR energy landscape of neuron $v_3$ under different settings of visible input neurons $v_1$ and $v_2$. Minima in the energy function correspond to stationary points of the dynamics. Gradient flow dynamics bring $v_3$ to these attractor points, resulting in correct XOR outputs.
  • Figure 4: Correcting a bit error in a Hamming (7,4) code. Visible neuron $g_5$ flips indicating the bit flip error happened on the $5$th codeword bit. $f_7$ is the hidden neuron corresponding to the memory of the correct codeword. Thin lines correspond to the other neuron activations.
  • Figure 5: Analog ET circuit demonstrating the autoregressive inference procedure. A newly inferenced token is decoded, sampled, and re-embedded to obtain the weight vector $\boldsymbol\xi_{L+1}^\text{attn}$, which is set as the weight vector for a new hidden neuron $h_{L+1}^\text{attn}$ in the energy attention block (light gray on right). For this layout we have flipped the crossbar array, so that indices $A$ and $\mu$ run horizontally and index $i$ runs vertically.
  • ...and 5 more figures