Table of Contents
Fetching ...

Modeling Memristor-Based Neural Networks with Manhattan Update: Trade-offs in Learning Performance and Energy Consumption

Walter Quiñonez, María José Sánchez, Diego Rubi

TL;DR

This work tackles online training of memristor-based neural networks using the hardware-friendly Manhattan update, under realistic device non-idealities such as nonlinear potentiation/depression (P/D) curves, finite conductance windows, and limited multilevel resolution. By simulating SP and DNN architectures trained on MNIST, the authors quantify how a non-linearity index $NLI$, conductance range, and level count $L$ affect convergence and accuracy, finding SP tolerates $NLI ≤ 10^-2$ and DNN tolerates $NLI ≤ 10^-3$, with accuracy improving as $L$ increases. A key contribution is the G_fix strategy, fixing one memristor in each differential pair to cut training energy by up to about 45% in DNN (and ~20% in SP) with minimal accuracy loss, demonstrating effective device–algorithm co-design. Overall, the results show that Manhattan-rule-based memristive learning can achieve scalable, low-power online training suitable for edge AI, provided careful control of non-idealities and conductance budgets.

Abstract

We present a systematic study of memristor based neural networks trained with the hardware-friendly Manhattan update rule, focusing on the trade offs between learning performance and energy consumption. Using realistic models of potentiation/depression (P/D) curves, we evaluate the impact of nonlinearity (NLI), conductance range, and number of accessible levels on both a single perceptron (SP) and a deep neural network (DNN) trained on the MNIST dataset. Our results show that SPs tolerate P/D nonlinearity up to NLI $\leq 0.01$, while DNNs require stricter conditions of NLI $\leq$ 0.001 to preserve accuracy. Increasing the number of discrete conductance states improves convergence, effectively acting as a finer learning rate. We further propose a strategy where one memristor of each differential pair is fixed, reducing redundant memristor conductance updates. This approach lowers training energy by nearly 50% in DNN with little to no loss in accuracy. Our findings highlight the importance of device algorithm codesign in enabling scalable, low power neuromorphic hardware for edge AI applications.

Modeling Memristor-Based Neural Networks with Manhattan Update: Trade-offs in Learning Performance and Energy Consumption

TL;DR

This work tackles online training of memristor-based neural networks using the hardware-friendly Manhattan update, under realistic device non-idealities such as nonlinear potentiation/depression (P/D) curves, finite conductance windows, and limited multilevel resolution. By simulating SP and DNN architectures trained on MNIST, the authors quantify how a non-linearity index , conductance range, and level count affect convergence and accuracy, finding SP tolerates and DNN tolerates , with accuracy improving as increases. A key contribution is the G_fix strategy, fixing one memristor in each differential pair to cut training energy by up to about 45% in DNN (and ~20% in SP) with minimal accuracy loss, demonstrating effective device–algorithm co-design. Overall, the results show that Manhattan-rule-based memristive learning can achieve scalable, low-power online training suitable for edge AI, provided careful control of non-idealities and conductance budgets.

Abstract

We present a systematic study of memristor based neural networks trained with the hardware-friendly Manhattan update rule, focusing on the trade offs between learning performance and energy consumption. Using realistic models of potentiation/depression (P/D) curves, we evaluate the impact of nonlinearity (NLI), conductance range, and number of accessible levels on both a single perceptron (SP) and a deep neural network (DNN) trained on the MNIST dataset. Our results show that SPs tolerate P/D nonlinearity up to NLI , while DNNs require stricter conditions of NLI 0.001 to preserve accuracy. Increasing the number of discrete conductance states improves convergence, effectively acting as a finer learning rate. We further propose a strategy where one memristor of each differential pair is fixed, reducing redundant memristor conductance updates. This approach lowers training energy by nearly 50% in DNN with little to no loss in accuracy. Our findings highlight the importance of device algorithm codesign in enabling scalable, low power neuromorphic hardware for edge AI applications.

Paper Structure

This paper contains 4 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: (a) Schematic representation of an ANN with a single hidden layer. (b) Electrical scheme of a crossbar array architecture for the hardware implementation of the ANN shown in (a). (c) Synthetic potentiation (red region) and depression (blue region) curves obtained for different parameters $\alpha$ and fixed $\#$L. (d) Normalized P/D curves as a function of normalized L. To compute NLI, we consider d to be the length of the ideal (linear) segment connecting the first and last points of the P/D curves, while L$_p$ and L$_d$ represent the lengths of the potentiation and depression curves, respectively.
  • Figure 2: (a) Heatmap corresponding to NLI as a function of parameters $\alpha$ and $\#L$. White lines represents contour lines of the function. Only the heatmap for the potentiation is presented (depression presents the same heatmap). The extracted P/D curves for each combination of parameters (colored points) on the contour line for NLI = 0.01 and NLI = 0.2 are shown in pannels (b) and (c) respectively, for normalized $G^{*}$ and $\#L^{*}$. For points on the same contour line, the shape of the normalized P/D curve are similar to each other. Heatmaps are similar across different conductance ranges.
  • Figure 3: Accuracy vs. NLI for (a) SP and (b) DNN and $\#$L spanning values of $50, 100, 200, 500$ and $1000$. The accuracy obtained remains almost constant for NLI $< 10^{-2}$ in the SP case and NLI $< 10^{-3}$ in the DNN case. For NLI values higher than this thresholds the accuracy starts to decrease (green region). The same results where obtained for different ranges of conductance. (c) Average energy consumption per epoch (right scale) for different conductance ranges (left scale) for SP (blue dots) and DNN (red dots). The parameters of the P/D curves used to calculate the power consumption were $\alpha$ = 9190.7 and $\#$L = 200. (d) Scaling of the average energy consumption per epoch as a function of the number of neurons in the hidden layer.
  • Figure 4: ((a), first row) Synaptic weight map for the first output neuron of SP ($w_{0j}$), using both $G_{free}$ (first column) and $G_{fix}$ (second column) update methods. ((a), middle row) Conductance map for $G_{0j}^+$ memristors. Each column shows results for the same neurons and layers as in the top row. ((a), bottom row) Conductance map for the $G_{0j}^-$ memristors; again, each column corresponds to the same neurons and layers as in the top row. (b) Accuracy evolution as a function of epochs. The solid line represents the mean values and the shaded band indicates the standard deviations. The blue line corresponds to the $G_{free}$ update method, while the orange line represents $G_{fix}$. (c) Training loss for both methods, with the validation loss shown in the inset. In both cases, convergence was achieved. A total of 200 training realizations were performed.
  • Figure 5: ((a) first row) Synaptic weight ($w1_{0j}$) map for the first neuron of the hidden layer of DNN, using $G_{free}$ update method (first column), and $G_{fix}$ (second column). (Left, middle row) Conductance map for the $G1_{0j}^+$ memristors, each column shows results for the same neurons and layers as in the top row. (Left, bottom row) Conductance map for the $G1_{0j}^-$ memristors, again, each column corresponds to the same neurons and layers as in the top row. (b) Accuracy convergence as a function of epochs. The solid line represents the mean, and the shaded band indicates the standard deviation. The blue line corresponds to the $G_{free}$ update method, while the orange line represents $G_{fix}$. (c) Training loss for both methods, with the validation loss shown in the inset. In both cases, convergence was achieved. A total of 200 training realizations were performed.
  • ...and 1 more figures