Table of Contents
Fetching ...

Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

TL;DR

This work proposes a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor, which associates incoming data with the past experience stored as semantic vectors and delivers a 77.6% and 93.3% reduction in energy consumption.

Abstract

The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.

Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

TL;DR

This work proposes a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor, which associates incoming data with the past experience stored as semantic vectors and delivers a 77.6% and 93.3% reduction in energy consumption.

Abstract

The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.
Paper Structure (20 sections, 10 equations, 6 figures)

This paper contains 20 sections, 10 equations, 6 figures.

Figures (6)

  • Figure 1: Brain-inspired dynamic neural network with memristors.a, Comparison of the computing model of the brain, static network and dynamic network. b, Comparison of the associative memory mechanism in the brain, digital hardware and memristor-based CAM. c, Comparison of the computing architecture of the brain, digital hardware, and memristor-based CIM.
  • Figure 2: Hardware-software co-design: Semantic memory-based dynamic neural network using CIM and CAM.a, The proposed architecture consists of a ternary quantized neural network implemented on memristor-based CIM for feature extraction, and an associative memory on memristor-based CAM. Based on the global average pooling (GAP), the network encodes extracted feature maps to low-dimensional ternary semantic center in memory. When a new sample is subsequently queried, the network calculates the semantic vector on each layer’s output feature map and matches them with the cached semantic centers of n classes in memory. The search vector finds the semantic center stored in CAM with maximum cosine similarity, which can be used to predict the class of the query. Once well matched, the network skips the rest of the layers and directly outputs the final results. In this example, an easy sample of a cat can be well classified through early layers with few computing resources and the subsequent layers will not participate in the inference. b, If the input is a hard sample, and the early layers fail to provide reliable predictions, classification for this sample requires a deeper network with more computation. c, The feature extraction using a memristor-based CIM circuit. d, Optical photo of the memristor chip. e, A cross-sectional transmission electron micrograph showcases the memristor crossbar array, fabricated by the backend-of-line process on a 40 nm technology node tape-out. f, The cross-sectional transmission electron micrograph reveals a solitary nanoscale memristor, operating on the formation and rupture of conducting filaments.
  • Figure 3: Hardware-software co-designed dynamic ResNet for MNIST dataset classification.a, Schematic representation of CIM and CAM based dynamic neural network on ResNet. The network extracts feature maps of each residual block, then the feature maps are encoded into semantic vectors (svs) through Global Average Pooling (GAP). Semantic vectors work as keys to lookup corresponding semantic center (sc) by measuring cosine similarities (sim). b-d, Visualized distribution of semantic vectors (smaller number) and semantic centers (bigger number) for the second, fifth and ninth residual block. e, The accuracy and budget drop of different model-quantization-noise combinations for ablation study, including static full-precision (SFP), ternary quantization (Qun), early exit (EE), early exit with ternary quantization (EE.Qun), early exit with ternary quantization and noise (EE.Qun+noise), and memristor-based hardware (Mem) experiment. f, Normalized confusion matrix of memristor-based hardware experiment. g, Operations (OPS) per layer and the probability of input passing through of each layer. h, Comparison of the inference energy consumption across GPU and memristor-based CIM and CAM in a hybrid analogue–digital system.
  • Figure 4: The intrinsic physical noise of memristor and the mitigation using ternary quantization networka, Conductance variance and statistical histogram of 5 randomly selected memristors with 10,000 read cycles. b-c, Map of the mean and standard deviation of conductance for 8,930 memristors with 10,000 read cycles. d, Scatter plot of the mean and standard deviation of the conductance values of 8,930 memristors. e, The histogram of mean conductance in b. f, Noisy CIM results plotted against noise-free CIM results. The red line represents the ideal case with matched experimental values, while the blue points represent the observed results. g, Write noise map for the value stored in CAM. h, Comparison of accuracy with write noise between the non-quantized network and the ternary quantized network. i, Comparison of accuracy with write and read noise between the non-quantized network and the ternary quantized network.
  • Figure 5: CIM and CAM based DNN for ModelNet classification using PointNet++.a, Schematic of CIM and CAM based DNN for PointNet++. The input is a 3D point cloud of a chair, the network extracts a 3D feature map of each set abstraction layer, then the feature maps are encoded into semantic vectors (svs) through Global Average Pooling (GAP). Semantic vectors work as keys to lookup corresponding semantic center (sc) by measuring cosine similarities(sim). b-d, Visualized distribution of semantic vectors (smaller numbers) and semantic centers (bigger numbers) for the second, forth and sixth layer. e, The accuracy and budget drop of different model-quantization-noise combinations for ablation study, including static full-precision (SFP), ternary quantization (Qun), early exit (EE), early exit with ternary quantization (EE.Qun), early exit with ternary quantization and noise (EE.Qun+noise). f, Normalized confusion matrix of memristor-based hardware experiment. g, Operations (OPS) per layer and the probability of input passing through of each layer. h, Comparison of the inference energy consumption across GPU and memristor-based CIM and CAM in a hybrid analogue–digital system.
  • ...and 1 more figures