Domain wall and Magnetic Tunnel Junction Hybrid for on-chip Learning in UNet architecture
Venkatesh Vadde, Bhaskaran Muralidharan, Abhishek Sharma
TL;DR
The paper tackles energy-efficient on-chip semantic segmentation by implementing UNet entirely in spintronic hardware, using domain-wall MTJ synapses for convolution and orthogonal-current injected SHE-MTJs for ReLU and max-pooling. It introduces a hybrid simulation workflow that couples micromagnetic dynamics, NEGF transport, and circuit models to accurately capture spin-transport, magnetization dynamics, and CMOS behavior. On the CamVid dataset, the hardware-accelerated UNet achieves ~${83.71}\%$ test accuracy with on-chip training energy around ${85.79}\ \mathrm{mJ}$ (for $\Delta=4.58$), and per-image testing energy of about ${1.55}\ \mu\mathrm{J}$, closely matching software performance while delivering substantial energy reductions. The work suggests a practical path toward scalable spintronic neuromorphic hardware for complex vision tasks, including a near ${10}\times$ energy improvement by exploiting unstable ferromagnets without sacrificing accuracy.
Abstract
We present spintronic devices based hardware implementation of UNet for segmentation tasks. Our approach involves designing hardware for convolution, deconvolution, rectified activation function (ReLU), and max pooling layers of the UNet architecture. We designed the convolution and deconvolution layers of the network using the synaptic behavior of the domain wall MTJ. We also construct the ReLU and max pooling functions of the network utilizing the spin hall driven orthogonal current injected MTJ. To incorporate the diverse physics of spin-transport, magnetization dynamics, and CMOS elements in our UNet design, we employ a hybrid simulation setup that couples micromagnetic simulation, non-equilibrium Green's function, SPICE simulation along with network implementation. We evaluate our UNet design on the CamVid dataset and achieve segmentation accuracies of 83.71$\%$ on test data, on par with the software implementation with 821mJ of energy consumption for on-chip training over 150 epochs. We further demonstrate nearly one order $(10\times)$ improvement in the energy requirement of the network using unstable ferromagnet ($Δ$=4.58) over the stable ferromagnet ($Δ$=45) based ReLU and max pooling functions while maintaining the similar accuracy. The hybrid architecture comprising domain wall MTJ and unstable FM-based MTJ leads to an on-chip energy consumption of 85.79mJ during training, with a testing energy cost of 1.55 $μJ$.
