Table of Contents
Fetching ...

Bruno: Backpropagation Running Undersampled for Novel device Optimization

Luca Fehlings, Bojian Zhang, Paolo Gibertini, Martin A. Nicholson, Erika Covi, Fernando M. Quintana

TL;DR

This work tackles efficient hardware-aware training for neuromorphic systems built from FeCAP-based FeLIF neurons and RRAM synapses. It introduces BRUNO, a dual-timescale training method where the forward pass operates at a fine $1\,\mu s$ timescale while backpropagation proceeds at a coarser $1\,\text{ms}$ scale, reducing the unrolled graph size and memory demands. BRUNO is instantiated with a physics-based FelIF neuron and 3-bit quantised RRAM synapses, and validated against BPTT, demonstrating substantial memory (97–99% peak) and training-time (50–60%) savings with comparable or better accuracy on spatio-temporal tasks like Bach chorales and Braille recognition. The results indicate that hardware-accurate learning with quantised synapses is practical and scalable, enabling efficient neuromorphic training directly aligned with device physics.

Abstract

Recent efforts to improve the efficiency of neuromorphic and machine learning systems have centred on developing of specialised hardware for neural networks. These systems typically feature architectures that go beyond the von Neumann model employed in general-purpose hardware such as GPUs, offering potential efficiency and performance gains. However, neural networks developed for specialised hardware must consider its specific characteristics. This requires novel training algorithms and accurate hardware models, since they cannot be abstracted as a general-purpose computing platform. In this work, we present a bottom-up approach to training neural networks for hardware-based spiking neurons and synapses, built using ferroelectric capacitors (FeCAPs) and resistive random-access memories (RRAMs), respectively. Unlike the common approach of designing hardware to fit abstract neuron or synapse models, we start with compact models of the physical device to model the computational primitives. Based on these models, we have developed a training algorithm (BRUNO) that can reliably train the networks, even when applying hardware limitations, such as stochasticity or low bit precision. We analyse and compare BRUNO with Backpropagation Through Time. We test it on different spatio-temporal datasets. First on a music prediction dataset, where a network composed of ferroelectric leaky integrate-and-fire (FeLIF) neurons is used to predict at each time step the next musical note that should be played. The second dataset consists on the classification of the Braille letters using a network composed of quantised RRAM synapses and FeLIF neurons. The performance of this network is then compared with that of networks composed of LIF neurons. Experimental results show the potential advantages of using BRUNO by reducing the time and memory required to detect spatio-temporal patterns with quantised synapses.

Bruno: Backpropagation Running Undersampled for Novel device Optimization

TL;DR

This work tackles efficient hardware-aware training for neuromorphic systems built from FeCAP-based FeLIF neurons and RRAM synapses. It introduces BRUNO, a dual-timescale training method where the forward pass operates at a fine timescale while backpropagation proceeds at a coarser scale, reducing the unrolled graph size and memory demands. BRUNO is instantiated with a physics-based FelIF neuron and 3-bit quantised RRAM synapses, and validated against BPTT, demonstrating substantial memory (97–99% peak) and training-time (50–60%) savings with comparable or better accuracy on spatio-temporal tasks like Bach chorales and Braille recognition. The results indicate that hardware-accurate learning with quantised synapses is practical and scalable, enabling efficient neuromorphic training directly aligned with device physics.

Abstract

Recent efforts to improve the efficiency of neuromorphic and machine learning systems have centred on developing of specialised hardware for neural networks. These systems typically feature architectures that go beyond the von Neumann model employed in general-purpose hardware such as GPUs, offering potential efficiency and performance gains. However, neural networks developed for specialised hardware must consider its specific characteristics. This requires novel training algorithms and accurate hardware models, since they cannot be abstracted as a general-purpose computing platform. In this work, we present a bottom-up approach to training neural networks for hardware-based spiking neurons and synapses, built using ferroelectric capacitors (FeCAPs) and resistive random-access memories (RRAMs), respectively. Unlike the common approach of designing hardware to fit abstract neuron or synapse models, we start with compact models of the physical device to model the computational primitives. Based on these models, we have developed a training algorithm (BRUNO) that can reliably train the networks, even when applying hardware limitations, such as stochasticity or low bit precision. We analyse and compare BRUNO with Backpropagation Through Time. We test it on different spatio-temporal datasets. First on a music prediction dataset, where a network composed of ferroelectric leaky integrate-and-fire (FeLIF) neurons is used to predict at each time step the next musical note that should be played. The second dataset consists on the classification of the Braille letters using a network composed of quantised RRAM synapses and FeLIF neurons. The performance of this network is then compared with that of networks composed of LIF neurons. Experimental results show the potential advantages of using BRUNO by reducing the time and memory required to detect spatio-temporal patterns with quantised synapses.

Paper Structure

This paper contains 16 sections, 6 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Operating principle of the felif neuron. When the membrane potential V is charged by the input current I$_{syn}$ via the leaky integrator (LI) and reaches the coercive voltage V$_c$, the current partially flows to the integrator (I). When the integrator state P reaches the saturation value P$_s$, the current flows back to the leaky integrator until it reaches the firing threshold and the neuron emits a spike. (b) Polarization and voltage of the neuron for a constant input current. As described in the operating principle, the input current is integrated into the polarization as long as $V\mathrm{_{mem}}>V\mathrm{_c}$ and $P<P\mathrm{_s}$
  • Figure 2: Schematic of the algorithm for computing the forward and backward passes with different time steps. During the forward pass (blue box) the simulation operates on 1 µ s timescale. In the backward pass (green box) the gradient is calculated on 1 ms timescale.
  • Figure 3: Comparison of the membrane potential transient between the spice simulation of the neuron circuit and the model used in the network simulations with a DC input current of 308 pA. Except for the spike amplitude, the polarization (a) and the voltage (b) match between the Python and spice simulations. The amplitude of the spike in the network is negligible as it is registered as an event in any case.
  • Figure 4: RRAM state programming and device conductance. (a) Device conductance based on the programming current applied, for a mean filament size of 0.4 nm length and 45 nm filament radius, an upper limit of 0.36 nm length and 49 nm filament radius, and a lower limit of 0.44 nm length and 41 nm filament radius. (b) Distribution of the different 3-bit quantised states.
  • Figure 5: Comparison of forward and gradient calculations of bptt, checkpointing and bruno. The green nodes represent points saved in memory and the blue nodes represent those that has be recomputed during the backward pass with checkpointing. In bptt the internal states of the model are stored at each time step, which increases memory consumption. Checkpointing provides a trade-off between time and memory, by storing key state points; however, it must recompute the intermediate states during backpropagation, which increases computation time. In the case of bruno, internal states at at key times are stored for the backward pass as with checkpointing. Furthermore, gradient computation does not recompute all intermediate states, as gradient computation occurs at a different timescale. This reduces the training time and memory consumption compared to bptt and checkpoints.
  • ...and 5 more figures