Table of Contents
Fetching ...

Voltage-Controlled Magnetic Tunnel Junction based ADC-less Global Shutter Processing-in-Pixel for Extreme-Edge Intelligence

Md Abdullah-Al Kaiser, Gourav Datta, Jordan Athas, Christian Duffee, Ajey P. Jacob, Pedram Khalili Amiri, Peter A. Beerel, Akhilesh R. Jaiswal

TL;DR

This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks and develops an algorithmic framework incorporating device and circuit constraints based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology.

Abstract

The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency by generating the output activations of the first neural network layer rather than the raw sensory data. In this article, we propose an energy and bandwidth efficient ADC-less processing-in-pixel architecture. This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks. In addition, we also introduce a global shutter burst memory read scheme utilizing fast and disturb-free read operation leveraging innovative use of nanoscale voltage-controlled magnetic tunnel junctions (VC-MTJs). Moreover, we develop an algorithmic framework incorporating device and circuit constraints (characteristic device switching behavior and circuit non-linearity) based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology. Finally, we evaluate the proposed system's performance on two complex datasets - CIFAR10 and ImageNet, showing improvements in front-end and communication energy efficiency by 8.2x and 8.5x respectively and reduction in bandwidth by 6x compared to traditional computer vision systems, without any significant drop in the test accuracy.

Voltage-Controlled Magnetic Tunnel Junction based ADC-less Global Shutter Processing-in-Pixel for Extreme-Edge Intelligence

TL;DR

This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks and develops an algorithmic framework incorporating device and circuit constraints based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology.

Abstract

The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency by generating the output activations of the first neural network layer rather than the raw sensory data. In this article, we propose an energy and bandwidth efficient ADC-less processing-in-pixel architecture. This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks. In addition, we also introduce a global shutter burst memory read scheme utilizing fast and disturb-free read operation leveraging innovative use of nanoscale voltage-controlled magnetic tunnel junctions (VC-MTJs). Moreover, we develop an algorithmic framework incorporating device and circuit constraints (characteristic device switching behavior and circuit non-linearity) based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology. Finally, we evaluate the proposed system's performance on two complex datasets - CIFAR10 and ImageNet, showing improvements in front-end and communication energy efficiency by 8.2x and 8.5x respectively and reduction in bandwidth by 6x compared to traditional computer vision systems, without any significant drop in the test accuracy.

Paper Structure

This paper contains 20 sections, 3 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: (a) The stack of the MTJs used in this work. Each MTJ is patterned as a circular pillar with a diameter of approximately 70 nm. (b) Dependence of the parallel (R_P) and antiparallel (R_AP) state resistances on voltage applied across the device. A TMR in excess of 150% is found at near-zero voltages.
  • Figure 2: Switching probability is shown as a function of pulse widths at various applied voltages (ranging from 0.7V to 0.9V), with the initial resistance state being (a) parallel (R_P) and anti-parallel (R_AP). For VC-MTJs to function effectively as binary thresholding neurons, they are expected to switch with high probability at higher voltages. Therefore, the (R_AP) state is used as the reset state for the binary neurons, as it demonstrates a higher switching probability across a wide range of pulse widths.
  • Figure 3: (a) Representative illustration of heterogeneously 3D integration of our proposed processing-in-pixel solution utilizing Cu-Cu hybrid bonding, where the top die comprises the pixel sensor array, and the bottom die consists of the processing blocks. Circuit implementation of (b) weight-augmented pixel circuit, (c) analog subtractor to compute the final convolution output, (d) unity-gain buffer to drive to VC-MTJ neurons, (e) multiple VC-MTJ neurons, (f) MUX for selective read and reset, (g) comparator-based read circuits, (h) reset pulse generator, (i) control pulses for burst mode write and read operation.
  • Figure 4: (a) A scatter plot comparing the normalized weight-augmented pixel output voltages to the ideal normalized multiplication values of various weights and input activations (Normalized W I). (b) Computation of final convolution output and burst-write driving pulses for binary activations of VC-MTJ neurons.
  • Figure 5: Final activation analysis for multiple VC-MTJs at various experimentally measured single VC-MTJ's anti-parallel to parallel switching probabilities of (a) 6.2%, (b) 92.4%, and (c) 97.17%, respectively. By employing majority threshold operation with multiple VC-MTJs, the error rate for non-switching at 0.7 V and switching at 0.8V and 0.9V decreases to below 0.1%. Note, we utilize 0.8V threshold in our hardware which ensures high-confident binary activation for our neural network.
  • ...and 4 more figures