Domain wall and Magnetic Tunnel Junction Hybrid for on-chip Learning in UNet architecture

Venkatesh Vadde; Bhaskaran Muralidharan; Abhishek Sharma

Domain wall and Magnetic Tunnel Junction Hybrid for on-chip Learning in UNet architecture

Venkatesh Vadde, Bhaskaran Muralidharan, Abhishek Sharma

TL;DR

The paper tackles energy-efficient on-chip semantic segmentation by implementing UNet entirely in spintronic hardware, using domain-wall MTJ synapses for convolution and orthogonal-current injected SHE-MTJs for ReLU and max-pooling. It introduces a hybrid simulation workflow that couples micromagnetic dynamics, NEGF transport, and circuit models to accurately capture spin-transport, magnetization dynamics, and CMOS behavior. On the CamVid dataset, the hardware-accelerated UNet achieves ~${83.71}\%$ test accuracy with on-chip training energy around ${85.79}\ \mathrm{mJ}$ (for $\Delta=4.58$), and per-image testing energy of about ${1.55}\ \mu\mathrm{J}$, closely matching software performance while delivering substantial energy reductions. The work suggests a practical path toward scalable spintronic neuromorphic hardware for complex vision tasks, including a near ${10}\times$ energy improvement by exploiting unstable ferromagnets without sacrificing accuracy.

Abstract

We present spintronic devices based hardware implementation of UNet for segmentation tasks. Our approach involves designing hardware for convolution, deconvolution, rectified activation function (ReLU), and max pooling layers of the UNet architecture. We designed the convolution and deconvolution layers of the network using the synaptic behavior of the domain wall MTJ. We also construct the ReLU and max pooling functions of the network utilizing the spin hall driven orthogonal current injected MTJ. To incorporate the diverse physics of spin-transport, magnetization dynamics, and CMOS elements in our UNet design, we employ a hybrid simulation setup that couples micromagnetic simulation, non-equilibrium Green's function, SPICE simulation along with network implementation. We evaluate our UNet design on the CamVid dataset and achieve segmentation accuracies of 83.71$\%$ on test data, on par with the software implementation with 821mJ of energy consumption for on-chip training over 150 epochs. We further demonstrate nearly one order $(10\times)$ improvement in the energy requirement of the network using unstable ferromagnet ($Δ$=4.58) over the stable ferromagnet ($Δ$=45) based ReLU and max pooling functions while maintaining the similar accuracy. The hybrid architecture comprising domain wall MTJ and unstable FM-based MTJ leads to an on-chip energy consumption of 85.79mJ during training, with a testing energy cost of 1.55 $μJ$.

Domain wall and Magnetic Tunnel Junction Hybrid for on-chip Learning in UNet architecture

TL;DR

test accuracy with on-chip training energy around

(for

), and per-image testing energy of about

, closely matching software performance while delivering substantial energy reductions. The work suggests a practical path toward scalable spintronic neuromorphic hardware for complex vision tasks, including a near

energy improvement by exploiting unstable ferromagnets without sacrificing accuracy.

Abstract

on test data, on par with the software implementation with 821mJ of energy consumption for on-chip training over 150 epochs. We further demonstrate nearly one order

improvement in the energy requirement of the network using unstable ferromagnet (

=4.58) over the stable ferromagnet (

=45) based ReLU and max pooling functions while maintaining the similar accuracy. The hybrid architecture comprising domain wall MTJ and unstable FM-based MTJ leads to an on-chip energy consumption of 85.79mJ during training, with a testing energy cost of 1.55

Paper Structure (18 sections, 11 equations, 11 figures, 1 table)

This paper contains 18 sections, 11 equations, 11 figures, 1 table.

Introduction
Architecture for Segmentation
Convolution
Deconvolution
ReLU and Max-pooling
Simulation method
Quantum transport: NEGF
Magnetization dynamics
SHE layer
Domain wall Synapse
Device parameters
Results
ReLU and Max pooling
Device parameters
Results
...and 3 more sections

Figures (11)

Figure 1: The UNet structure is illustrated, where the feature map is represented by blue boxes with the number of channels indicated at the top and the size displayed on the left edge. White boxes signify copied features from previous stages, and arrows indicate various operations. An example input image and its corresponding output are also depicted.
Figure 2: The convolution operation using DW-based cross-bar array. The vertical lines symbolize the convolution kernels, and the input is applied to the horizontal lines. The DW device along with parallel conductance is used to store the kernel values. The resulting current output from the vertical lines (kernel output) is connected to the ReLU/ReLU+Max pooling devices.
Figure 3: The deconvolution operation as a combination of zero-insertion and convolution operation.
Figure 4: Overview of the simulation setup. (a) Micromagnetic simulation of the domain wall is simulated in mumax3, and the magnetization outcomes are translated to MTJ conductance using parallel and anti-parallel conductances obtained from NEGF simulation. (b) Hybrid NEGF-CMOS simulation setup for ReLU and ReLU-max pooling circuits. The NEGF is interconnected with a voltage divider circuit in a self-consistent manner to compute MTJ resistance. This resistance is then integrated into HSPICE circuit simulation using VerilogA. The LLGS equation is interconnected with other circuit components to compute ReLU and ReLU-max pooling functions. (c) The characteristics of the DW synapse, ReLU circuit, and ReLU-max pooling network are incorporated into the TensoFlow package to implement the UNet architecture, which is utilized for semantic image segmentation.
Figure 5: (a) Schematic of the domain-wall based synapse. $I_{write}$ denotes the write current passing through terminals T1 and T3, while $I_{read}$ represents the read current flowing through terminals T2 and T3. (b) The conductance of the DW-MTJ device with respect to input current pulse ($I_{write}$). (b) The velocity of the domain wall with varying input current density.
...and 6 more figures

Domain wall and Magnetic Tunnel Junction Hybrid for on-chip Learning in UNet architecture

TL;DR

Abstract

Domain wall and Magnetic Tunnel Junction Hybrid for on-chip Learning in UNet architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (11)