Table of Contents
Fetching ...

Harnessing Nonidealities in Analog In-Memory Computing Circuits: A Physical Modeling Approach for Neuromorphic Systems

Yusuke Sakemi, Yuji Okamoto, Takashi Morie, Sou Nobukawa, Takeo Hosomi, Kazuyuki Aihara

TL;DR

The paper tackles the energy efficiency barrier of large-scale deep learning by adopting a bottom-up, physics-aware approach that directly models IMC nonidealities as ODE-based physical neural networks (PNNs). It introduces differentiable spike-time discretization (DSTD) to enable scalable training of these IMC-aware networks while capturing reversal-potential dynamics, and demonstrates that such nonidealities can be harnessed to improve learning rather than merely degrade performance. ThroughFashion-MNIST and CIFAR-10 experiments and post-layout sky130 SPICE validation, the authors show that reversal potentials can be exploited when incorporated into training, and that hardware-aware PNN models closely match circuit dynamics, reducing modeling error by orders of magnitude compared with top-down mappings. The work offers a pathway to energy-efficient neuromorphic computing by integrating IMC nonidealities into the learning process and delivering substantial training speedups and memory reductions.

Abstract

Large-scale deep learning models are increasingly constrained by their immense energy consumption, limiting their scalability and applicability for edge intelligence. In-memory computing (IMC) offers a promising solution by addressing the von Neumann bottleneck inherent in traditional deep learning accelerators, significantly reducing energy consumption. However, the analog nature of IMC introduces hardware nonidealities that degrade model performance and reliability. This paper presents a novel approach to directly train physical models of IMC, formulated as ordinary-differential-equation (ODE)-based physical neural networks (PNNs). To enable the training of large-scale networks, we propose a technique called differentiable spike-time discretization (DSTD), which reduces the computational cost of ODE-based PNNs by up to 20 times in speed and 100 times in memory. We demonstrate that such large-scale networks enhance the learning performance by exploiting hardware nonidealities on the CIFAR-10 dataset. The proposed bottom-up methodology is validated through the post-layout SPICE simulations on the IMC circuit with nonideal characteristics using the sky130 process. The proposed PNN approach reduces the discrepancy between the model behavior and circuit dynamics by at least an order of magnitude. This work paves the way for leveraging nonideal physical devices, such as non-volatile resistive memories, for energy-efficient deep learning applications.

Harnessing Nonidealities in Analog In-Memory Computing Circuits: A Physical Modeling Approach for Neuromorphic Systems

TL;DR

The paper tackles the energy efficiency barrier of large-scale deep learning by adopting a bottom-up, physics-aware approach that directly models IMC nonidealities as ODE-based physical neural networks (PNNs). It introduces differentiable spike-time discretization (DSTD) to enable scalable training of these IMC-aware networks while capturing reversal-potential dynamics, and demonstrates that such nonidealities can be harnessed to improve learning rather than merely degrade performance. ThroughFashion-MNIST and CIFAR-10 experiments and post-layout sky130 SPICE validation, the authors show that reversal potentials can be exploited when incorporated into training, and that hardware-aware PNN models closely match circuit dynamics, reducing modeling error by orders of magnitude compared with top-down mappings. The work offers a pathway to energy-efficient neuromorphic computing by integrating IMC nonidealities into the learning process and delivering substantial training speedups and memory reductions.

Abstract

Large-scale deep learning models are increasingly constrained by their immense energy consumption, limiting their scalability and applicability for edge intelligence. In-memory computing (IMC) offers a promising solution by addressing the von Neumann bottleneck inherent in traditional deep learning accelerators, significantly reducing energy consumption. However, the analog nature of IMC introduces hardware nonidealities that degrade model performance and reliability. This paper presents a novel approach to directly train physical models of IMC, formulated as ordinary-differential-equation (ODE)-based physical neural networks (PNNs). To enable the training of large-scale networks, we propose a technique called differentiable spike-time discretization (DSTD), which reduces the computational cost of ODE-based PNNs by up to 20 times in speed and 100 times in memory. We demonstrate that such large-scale networks enhance the learning performance by exploiting hardware nonidealities on the CIFAR-10 dataset. The proposed bottom-up methodology is validated through the post-layout SPICE simulations on the IMC circuit with nonideal characteristics using the sky130 process. The proposed PNN approach reduces the discrepancy between the model behavior and circuit dynamics by at least an order of magnitude. This work paves the way for leveraging nonideal physical devices, such as non-volatile resistive memories, for energy-efficient deep learning applications.

Paper Structure

This paper contains 28 sections, 93 equations, 19 figures, 3 tables, 6 algorithms.

Figures (19)

  • Figure 1: Comparison of the dynamics between IMC circuits and biological neurons a. Schematic of charge-domain IMC Circuits. Input signals are delivered through horizontal lines in the form of spikes. Upon receiving these spikes, synaptic currents are induced along the vertical lines due to interactions between the spike signals and the memory elements, denoted as 'W' in the figure. The currents are integrated and converted into voltages at capacitors. Circuit examples within the regions outlined by dashed blue lines are illustrated in panels b and c. b. Dynamics of IMC circuits with resistors and transistor switches. At time $t_1$, a transistor switch is activated, allowing current to flow through a resistor with conductance $\sigma_1$. Subsequently, at time $t_2$, another transistor switch is turned ON, similarly permits current to pass through a resistor of conductance $\sigma_2$. The resulting current induces a change in the voltage across the capacitor, $v(t)$. According to Ohm's law, the current magnitude depends on the capacitor voltage $v(t)$. c. Dynamics of IMC circuits with current sources and transistor switches. While an ideal current source provides a constant current that is independent of the capacitor voltage $v(t)$, real-world current sources exhibit a behavior in which the current varies linearly with the capacitor voltage. This nonideal behavior can be characterized by the parameter $\lambda^\pm$. d. Dynamics of membrane potentials in IMC-aware neuron models. The net activity of receptors for excitatory current $p^+(t)$ and inhibitory current $p^-(t)$ exhibits stepwise changes in response to incoming spikes. Synaptic currents are governed by Ohm's law, incorporating reversal potentials $E_\text{rev}^+$ and $E_{\text{rev}}^-$. This neuron model, referred to as the IMC-aware neuron model, captures the dynamics observed in the IMC circuits described in a and b.
  • Figure 2: Computation with differentiable spike-time discretization (DSTD). a. Illustration of calculating the firing time of a neuron in the $l$th layer when spikes are input from the $j$th and $k$th neurons in the preceding $l-1$th layer at times $t_j^{(l-1)}$ and $t_k^{(l-1)}$, respectively, using DSTD. First, the discrete time points $T^{(l)}_m$ are determined based on an offset time $t_\text{offset}^{(l)}$ and a time interval $\Delta _\tau$. The discrete spike variables $s_{im}^{(l)}$ at these time points are calculated from the continuous-time spikes using DSTD. Using these discrete-time spikes, the membrane potential at each discrete time point is computed via an analytical solution. Based on this membrane potential, the firing time of the neuron is computed. b. The computational method using DSTD for a multilayer model. The output of each layer is a continuous-time spike signal, which is then converted into a discrete-time spike signal by DSTD. This discrete spike signal serves as input to the next layer. c. Examples of the time evolution of the membrane potentials in a single-layer network without learning. The input consists of 1000 random spikes generated from a uniform distribution over the interval $[0, 1]$. The weights are in a random initial state. The results for three different reversal potentials ($|E_\text{rev}^\pm|=10,~2,$ and $1$) are shown in separate graphs. In each figure, the solid line represents the exact trajectory of the membrane potential, while the dashed line and plotted symbols represent the approximate membrane potential calculated using DSTD with steps $M=4$ when $t_\text{offset}$ is set to 0. d. Same as c, but the membrane potentials are plotted for different values of $M$ in different graphs. The $|E_\text{rev}^\pm|$ is set to 1.
  • Figure 3: Basic properties of DSTD.a, b. Errors between the membrane potential values at time 1 when using the exact solution and the approximated solution with DSTD. The membrane potential is obtained from a single-layer, untrained network consisting of 10 neurons, with spike inputs. The data consist of 1000 samples, with each sample containing 1000 input spikes. The input spike times are uniformly distributed over the interval $[0,1]$, and the synaptic weights are randomly assigned. In a., a double logarithmic plot illustrates how the error decreases as the number of DSTD steps $M$ increases, for various values of $|E_\text{rev}^\pm|$ (with $E_\text{rev}^+ = - E_\text{rev}^-$). In b., the error is plotted as a function of $|E_\text{rev}^\pm|$ (with $E_\text{rev}^+ = - E_\text{rev}^-$), for different numbers of DSTD steps $M$. In both a and b, dashed lines denote experimental results, while solid lines denotes theoretical predictions. c, d. These figures depict the computational efficiency for the case of RC-Spike models ( c) and TTFS-SNN models ( d) when the number of input spikes is varied for a single-layer network consisting of 1000 neurons. The dataset consists of 1000 samples, with a batch size of 100. The input spike times for each sample are uniformly distributed over the interval $[0,1]$.
  • Figure 4: Learning results for RC-Spike models using DSTD.a. The experimental setup. Each data point of a dataset is represented as a three-dimensional tensor $x_{ijk}$. These features are converted into single input spikes, $t_{ijk}^{(0)}=1-x_{ijk}$, which are processed by a PNN, which incorporates IMC nonidealities $E_\text{rev}^\pm$. We employed the RC-Spike model as a PNN. The RC-Spike model operates in two phases: the accumulation phase and firing phase. During the accumulation phase, the neuron membrane potential evolves over time as it receives spikes from the preceding layer. In this example, three input spikes arrive at times $t_1$, $t_2$, and $t_3$. Between receiving consecutive spikes, the membrane potential follows an exponential decay curve (RC-decay), influenced by the reversal potential. In the firing phase, the membrane potential increases linearly, and a spike is generated when it exceeds the firing threshold. Similar experiments are conducted for the case of TTFS-SNN models, and the results are summarized in SI.C Simulation results for TTFS-SNNs Simulation results for TTFS-SNNs. b-d. Experimental results for fully connected RC-Spike models (784-400-400-10) on the Fashion-MNIST dataset. In b, learning curves for $|E_\text{rev}^\pm| = 4$ (upper) and $|E_\text{rev}^\pm| = 1$ are shown. The error bars represents the standard deviation across five networks with different initial weights. Each panel shows the results for different values of DSTD steps $M$. Additionally, we show the cases where the offset time $t_\text{offset}$ was randomized (random offset) and fixed (fixed offset) for each mini-batch. In c, accuracies obtained from training the model with different DSTD steps $M$ are shown. For each $M$, results for $|E_\text{rev}^\pm|=4,~2,$ and $1$ are presented from top to bottom. Each panel compares the model performance with and without the introduction of a random offset. Each data point represents the mean and standard deviation of the training results from five models, each initialized with different random weights. In d, recognition accuracies of models trained with 15 DSTD steps (upper) and 3 DSTD steps (lower) as a function of $|E_\text{rev}^\pm|$. In each panel, the effects of introducing a random offset are compared. The 95% confidence intervals are estimated using Gaussian process regression. The green line represents the performance of a model trained with $|E_\text{rev}^\pm|=100$ (equivalent to an ANN). The green dashed line represents the performance of the ANN case, but the positive and negative weights are optimally scaled for specific $|E_\text{rev}^\pm|$ values in the test phase. e-g, Experimental results for convolutional RC-Spike models on the CIFAR-10 dataset. The experimental setup of e, f, and g are same as b, c, and d, respectively.
  • Figure 5: Circuit overview.a. Circuit layout of the RC-Spike circuit designed using the sky130 PDK. This circuit forms a two-layer network, with each layer composed of five neurons. The inset shows a schematic overview of the layout. One synapse circuit in the second layer and one neuron circuit in the first layer are enclosed by orange dotted and dashed lines, respectively. b. Timing diagram. Each layer of the RC-Spike circuit operates through three phases: the reset phase, accumulation phase, and firing phase. These phases are controlled by two digital signals: the phase signal and reset signal. During the accumulation phase, spike signals are received by each layer, while in the firing phase, the spike signals are transmitted. c. Circuit diagram of the synapse circuit. The diagram illustrates the case of two inputs and two outputs for simplicity. The circuit employs a paired configuration of MOSFETs for weights and selectors, commonly referred to as a 1T1R topology. Two types of pairs exist: one composed of N-type MOSFETs (NMOS) for positive weights (denoted as NM$_{ij}$ and NS$_{ij}$) and the other composed of P-type MOSFETs (PMOS) for negative weights (denoted as PM$_{ij}$ and PS$_{ij}$). Note that the membrane potential in the circuit is inverted in polarity. The MOSFETs responsible for the weights (NM$_{ij}$ and PM$_{ij}$) regulate the current, with the amount controlled by the bias voltage $V_{ij}^{N(P)}$. In the simulation, the bias voltage is externally supplied. The selector MOSFETs turn ON when a spike signal $V_i^\text{spike}$ arrives, allowing current to flow into the dendrites. All MOSFETs used in the synapse circuit have a gate length of 250 nm and a gate width of 1 $\mu\text{m}$. d. Circuit diagram of the neuron circuit. The neuron circuit consists of two components: the input component and output component. The input component accumulates current from the synapse circuit during the accumulation phase. The output component generates a spike during the firing phase. During the firing phase, the discharger circuit decreases the membrane potential at a constant rate, and the sensing inverter triggers a spike when the membrane potential falls below a threshold value, by inverting its output. (continues to the next page)
  • ...and 14 more figures