Table of Contents
Fetching ...

Multi-Function Multi-Way Analog Technology for Sustainable Machine Intelligence Computation

Vassilis Kalantzis, Mark S. Squillante, Shashanka Ubaru, Tayfun Gokmen, Chai Wah Wu, Anshul Gupta, Haim Avron, Tomasz Nowicki, Malte Rasch, Murat Onen, Vanessa Lopez Marrero, Effendi Leobandung, Yasuteru Kohda, Wilfried Haensch, Lior Horesh

TL;DR

The paper tackles the rising energy and speed bottlenecks in AI numerical computation by deploying MFMWA memristor crossbar arrays for in‑memory MV products and OP updates, paired with randomized numerical linear algebra to tolerate analog noise. It develops a general RNLA framework for MFMWA, including randomized OLS solvers and PCA via Johnson‑Lindenstrauss/epsilon‑subspace embeddings, and validates the approach through physical MFMWA experiments and large‑scale simulations. Key results show orders‑of‑magnitude reductions in both computation and energy with accuracy comparable to digital baselines, and demonstrate effective handling of streaming and evolving data. The work indicates substantial practical impact for sustainable MI/AI acceleration, particularly in scenarios with large, incremental data inputs.

Abstract

Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous improvements in computation and energy, but at the expense of inherent unpredictability and noise. We devise novel randomized algorithms tailored to MFMWA architectures that mitigate the detrimental impact of imperfect analog computations while realizing their potential benefits across various areas of AI, such as applications in computer vision. Through analysis, measurements from analog devices, and simulations of larger systems, we demonstrate orders of magnitude reduction in both computation and energy with accuracy similar to digital computers.

Multi-Function Multi-Way Analog Technology for Sustainable Machine Intelligence Computation

TL;DR

The paper tackles the rising energy and speed bottlenecks in AI numerical computation by deploying MFMWA memristor crossbar arrays for in‑memory MV products and OP updates, paired with randomized numerical linear algebra to tolerate analog noise. It develops a general RNLA framework for MFMWA, including randomized OLS solvers and PCA via Johnson‑Lindenstrauss/epsilon‑subspace embeddings, and validates the approach through physical MFMWA experiments and large‑scale simulations. Key results show orders‑of‑magnitude reductions in both computation and energy with accuracy comparable to digital baselines, and demonstrate effective handling of streaming and evolving data. The work indicates substantial practical impact for sustainable MI/AI acceleration, particularly in scenarios with large, incremental data inputs.

Abstract

Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous improvements in computation and energy, but at the expense of inherent unpredictability and noise. We devise novel randomized algorithms tailored to MFMWA architectures that mitigate the detrimental impact of imperfect analog computations while realizing their potential benefits across various areas of AI, such as applications in computer vision. Through analysis, measurements from analog devices, and simulations of larger systems, we demonstrate orders of magnitude reduction in both computation and energy with accuracy similar to digital computers.
Paper Structure (25 sections, 2 theorems, 29 equations, 6 figures, 7 tables, 2 algorithms)

This paper contains 25 sections, 2 theorems, 29 equations, 6 figures, 7 tables, 2 algorithms.

Key Result

Lemma 3.1

Given $0 <\epsilon < 1/2$, a set ${\cal A}$ of $m$ points in $\mathbb{R}^n$ and a parameter $\ell \geq 8\ln(m)/\epsilon^2$, there exists a map $f: \mathbb{R}^n\rightarrow \mathbb{R}^\ell$ such that for all $u,v\in {\cal A}$.

Figures (6)

  • Figure 1: (a) Multi-Function Multi-Way Analog array technology. Center: MV product. The output voltages $V_{out,k}$ consist of the integral of the currents $I_k$ over time divided by the capacitance $C$ per $V_{out,k} = \frac{1}{C} \int_0^{\top} I_k(\tau) d \tau$. At each cross point, an element with conductance $g_{kl}$ interacts with the input signal $V_{in,j}(t)$ per Ohm's law and determines the current signal $I_k(t)$. Left: OP update alters the conductance of each cross point based on coincidence of the input pulse signals from both sides. Right: zoom-in view over the pulse coincidence update mechanism. The conductance changes whenever the coincidence of the input pulses occurs. (b) Hybrid architecture consisting of a digital chip including a (multi-core) CPU and RAM system memory, connected to one or more MFMWA arrays through a dedicated system bus. The CPU is focused on executing low-complexity operations as well as transferring data to the MFMWA array(s) through the dedicated system bus.
  • Figure 2: Left: Classification error rate achieved by the baseline and randomized streaming approaches on a purely digital computational environment versus that of randomized streaming implemented through an MFMWA array. Right: Cosine similarity between the ideal solution and the regressors (solutions) returned by simulation and actual hardware. In the latter case, the number of pulses was varied as 15, 31, and 63 (corresponding to 4-bit, 5-bit, and 6-bit resolution, respectively).
  • Figure 3: Simulated wall-clock times and estimated energy consumption of analog-digital vs. digital (without sketching). The number of samples $m$ and sketching dimension $\ell$ vary. The number of features of each sample was set to $n=2048$ (left) and $n=4096$ (right).
  • Figure 4: Video Background Subtraction (Lobby dataset). Four sample frames from the video (leftmost column); Background subtraction obtained using hybrid PCA (center column) and digital PCA (rightmost column).
  • Figure 5: Dataset used for the physical experiment. Left: the dataset of three-dimensional coordinates. Right: two-dimensional orthogonal projection along the two leading Cartesian coordinates.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Lemma 3.1: Johnson-Lindenstrauss Lemma
  • Definition 3.1: $\epsilon$-subspace embedding ($L_2$ norm)
  • Lemma 3.2