Table of Contents
Fetching ...

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

Zihao Chen, Faiek Ahsan, Johannes Leugering, Gert Cauwenberghs, Shantanu Chakrabartty

TL;DR

The analysis presented in this paper captures the out-of-equilibrium thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic which only depend on the number of model-update operations (OPS), the model-size in terms of number of parameters, the speed of convergence, and the precision of the solution.

Abstract

Neuromorphic or neurally-inspired optimizers rely on local but parallel parameter updates to solve problems that range from quadratic programming to Ising machines. An ideal realization of such an optimizer not only uses a compute-in-memory (CIM) paradigm to address the so-called memory-wall (i.e. energy dissipated due to repeated memory read access), but also uses a learning-in-memory (LIM) paradigm to address the energy bottlenecks due to repeated memory writes at the precision required for optimization (the update-wall), and to address the energy bottleneck due to the repeated transfer of information between short-term and long-term memories (the consolidation-wall). In this paper, we derive theoretical estimates for the energy-to-solution metric that can be achieved by this ideal neuromorphic optimizer which is realized by modulating the energy-barrier of the physical memories such that the dynamics of memory updates and memory consolidation matches the optimization or the annealing dynamics. The analysis presented in this paper captures the out-of-equilibrium thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic which only depend on the number of model-update operations (OPS), the model-size in terms of number of parameters, the speed of convergence, and the precision of the solution. To show the practical applicability of our results, we apply our analysis for estimating the lower-bound on the energy-to-solution metrics for large-scale AI workloads.

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

TL;DR

The analysis presented in this paper captures the out-of-equilibrium thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic which only depend on the number of model-update operations (OPS), the model-size in terms of number of parameters, the speed of convergence, and the precision of the solution.

Abstract

Neuromorphic or neurally-inspired optimizers rely on local but parallel parameter updates to solve problems that range from quadratic programming to Ising machines. An ideal realization of such an optimizer not only uses a compute-in-memory (CIM) paradigm to address the so-called memory-wall (i.e. energy dissipated due to repeated memory read access), but also uses a learning-in-memory (LIM) paradigm to address the energy bottlenecks due to repeated memory writes at the precision required for optimization (the update-wall), and to address the energy bottleneck due to the repeated transfer of information between short-term and long-term memories (the consolidation-wall). In this paper, we derive theoretical estimates for the energy-to-solution metric that can be achieved by this ideal neuromorphic optimizer which is realized by modulating the energy-barrier of the physical memories such that the dynamics of memory updates and memory consolidation matches the optimization or the annealing dynamics. The analysis presented in this paper captures the out-of-equilibrium thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic which only depend on the number of model-update operations (OPS), the model-size in terms of number of parameters, the speed of convergence, and the precision of the solution. To show the practical applicability of our results, we apply our analysis for estimating the lower-bound on the energy-to-solution metrics for large-scale AI workloads.
Paper Structure (8 sections, 34 equations, 6 figures)

This paper contains 8 sections, 34 equations, 6 figures.

Figures (6)

  • Figure 1: Abstraction of an ideal neuromorphic machine that addresses learning/training bottlenecks due to memory accesses and updates: (A) Memory-wall which arises due fetching data from memory units that are physically separated from the compute units; (B) Update-wall which arises due to the frequency and precision of memory writes; and (C) Consolidation-wall which arises due to limited memory capacity and fetching data across memory hierarchies. (D) Compute-in-memory paradigm to address memory-wall by co-locating memory and compute functions; (E)-(F) Learning-in-memory paradigms to address update-wall and consolidation-wall where learning is also integrated within the memory either driven by an external energy source (E) or driven by non-equilibrium process of memory erasure (F).
  • Figure 2: Abstraction of memory model for LIM: (A) Equivalent circuit where a capacitor represents an energy-storage and a memory element, where as the compute function controls the rate of leakage through the variable resistor $R_t$; (B) Continuous-time memory decay characteristics as the value of $R_t$ is modulated; (C) A discrete-time model of the memory decay based on the modulation of the energy-barrier; (D) An equivalent circuit model to allow for bidirectional memory updates of a parameter stored as the differential voltage $W_n = W^+_n-W^-_n$, where the differetial nodes are connected to thermal reservoir to allow current transmission; (E) The energy-band diagram representation of the differential memory model in equilibirum state where no meaningful updates of the memory is made; (F) The net memory update rate $J_n$ dissipates power when external perturbation $\Delta E_n$ is injected to the circuit to break the equilibrium.
  • Figure 3: The energy-barrier vs energy-difference(stored parameter) or $E^0_n$-vs-$\Delta E_n$ plot: (A) Retention (energy barrier height) versus stored parameter for different normalized update rates $r_n = \frac{J_n}{J_{\text{max}}}$ that can achieved for a given $E^0$ and $\Delta E$. The shaded area indicates the region of inadmissible update rates. (B) Different LIM algorithms where memory elements follow different trajectories to the final steady-state solutions, denoted as $\Delta E_S$ and $\Delta E_T$ respectively. As the energy gradient is minimized while the retention increases, both factors contribute to the asymptotically decreasing update rate $J_n$.
  • Figure 4: Illustration of LIM for a two-parameter model: A. With a time-varying energy barrier($E^0_n$) of the memory system, the energy dynamics of the system varies w.r.t. time $T_n$, where the initial system energy $E(W, T_1)$ corresponds to the learning objective/loss function $L(\mathbf{W})$. For a two-parameter LIM system, $W_{x,n}$, $W_{y,n}$ evolves according to the instantaneous energy contour of the system $E(W,T_n)$ at time $T_n$. The barrier modulation dynamics are designed such that the learned parameter converges to the optimal solution $\it{\mathbf{W}^*}$ and avoid the trivial solution $\it{\mathbf{W}^{trivial}}$ for the learning objective $L(\mathbf{W})$. B. The parameters $W_x$ and $W_y$ overcome dynamically modulated energy barrier in LIM memory in discrete time steps $n$.
  • Figure 5: Empirical data used for estimating agnostic energy-efficiency bounds: Trends showing the growth of computational and energy needs for training AI models Giattino2023aiDatagrattafiori2024llama3herdmodels, which has been used to predict the e28 FLOPs and e17J of energy that would be needed to train a brain-scale AI model
  • ...and 1 more figures