Table of Contents
Fetching ...

Efficient and accurate neural field reconstruction using resistive memory

Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

TL;DR

The system's efficacy is demonstrated on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes.

Abstract

Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods on digital computers face both software and hardware challenges. On the software front, difficulties arise from storage inefficiencies in conventional explicit signal representation. Hardware obstacles include the von Neumann bottleneck, which limits data transfer between the CPU and memory, and the limitations of CMOS circuits in supporting parallel processing. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. Software-wise, we employ neural field to implicitly represent signals via neural networks, which is further compressed using low-rank decomposition and structured pruning. Hardware-wise, we design a resistive memory-based computing-in-memory (CIM) platform, featuring a Gaussian Encoder (GE) and an MLP Processing Engine (PE). The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit. We demonstrate the system's efficacy on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.

Efficient and accurate neural field reconstruction using resistive memory

TL;DR

The system's efficacy is demonstrated on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes.

Abstract

Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods on digital computers face both software and hardware challenges. On the software front, difficulties arise from storage inefficiencies in conventional explicit signal representation. Hardware obstacles include the von Neumann bottleneck, which limits data transfer between the CPU and memory, and the limitations of CMOS circuits in supporting parallel processing. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. Software-wise, we employ neural field to implicitly represent signals via neural networks, which is further compressed using low-rank decomposition and structured pruning. Hardware-wise, we design a resistive memory-based computing-in-memory (CIM) platform, featuring a Gaussian Encoder (GE) and an MLP Processing Engine (PE). The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit. We demonstrate the system's efficacy on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
Paper Structure (25 sections, 8 equations, 6 figures)

This paper contains 25 sections, 8 equations, 6 figures.

Figures (6)

  • Figure 1: Cross-level co-optimizations for our system.a, Human brain's capability of reconstructing experiences from sparse observations. b, Our brain-inspired system for efficient signal reconstruction with sparse inputs. c, Real-world applications of signal reconstruction from sparse input. d, Challenges faced by traditional approaches at different levels of software and hardware. From left to right, these include: At the representation level, traditional explicit representation methods face low storage efficiency, limited flexibility in storage formats, and inadequate scalability for resolution switching. At the algorithm level, uncompressed AI models are unsuitable for edge deployment. At the architecture level, the von Neumann architecture leads to data transfer overhead due to its separate processing and memory units. At the circuit level, frequently used pseudo random number generators and Multiply-Accumulators (MAC) are sequential. e, Our approach's innovations across different levels. From left to right, these include: At the representation level, we use neural fields to represent data, with signals as functions of space and time coordinates embodied through a neural network. At the algorithm level, we utilize low-rank decomposition and structured pruning to reduce the number of parameters that need to be mapped onto hardware. At the architecture level, we develop hybrid analog-digital system where the resistive memory-based analog core collocates memory and processing. At the circuit level, we parallelly generate true random numbers using resistive memory's intrinsic randomness for Gaussian encoding, and perform MAC using parallel and precise hardware-aware quantization circuits.
  • Figure 2: Hardware-aware quantization (HAQ) for accurate in-memory matrix multiplications.a, The write noise of resistive memory. The distribution of conductance values read from 10,000 cells subjected to the same set operation. b, The read noise of resistive memory. The distribution of conductance values obtained from 8 randomly selected devices during 20,000 read operations. c, The quantization process of a weight value using traditional post training asymmetric uniform quantization (PTQ) method. d, The flowchart of the HAQ method. Each cell is iteratively determined to be written as HRS or LRS until the specified bit width is achieved. e, The process of quantizing the same weight value using the HAQ method. The final programmed value closely approximates the original value. f,g, The experimental error in matrix multiplication when quantizing with PTQ and HAQ on resistive memory, respectively. HAQ features clear reduction of errors. h,i, The stability of matrix multiplication using PTQ and HAQ on resistive memory, respectively. Both methods are robust to temporal conductance fluctuation.
  • Figure 3: Architecture and circuits of our hardware co-design.a, Architecture of the Gaussian Encoder (GE), consisting of a resistive memory in-memory computing block with random weights and a digital CORDIC block. b, Architecture of the MLP Processing Engine (PE), consisting of resistive memory in-memory computing block with Variable Current Multiplicative Amplification Circuit (VCMAC) block . c, Circuit diagram of the VCMAC block. d, Current multiplicative amplification results with VCMAC, where $S$ represents different significance ratios. The input and amplified current are indicated in Fig. \ref{['fig3']}.c e, Optical image of the in-memory computing chip, consisting of a 512 $\times$512 resistive memory in-memory computing macro using 40 nm technology node, with cross-section transmission electron microscopy (TEM) images of 1T1R array and individual resistive memory cell.
  • Figure 4: 3D CT reconstruction.a, Schematic of reconstructing complete 3D CT images from sparse samplings. b, The impact of different encoding methods on reconstruction qualities. c, The normalized resistive memory conductance distribution of model's output layer weights mapped onto MLP PE through HAQ. d, The normalized resistive memory conductance distribution of the random Gaussian encoding matrix of GE. e, The reconstruction quality given different number of sparse samplings of CT projections. f, The comparison of reconstructed results from dense observations (complete 40 CT slices) and ground truth. g, The quantitative comparison of dense reconstruction quality using software, our system with HAQ, and our system with PTQ. h, The comparison of reconstructed results from sparse observations (20 CT slices) and ground truth. i, The quantitative comparison of sparse reconstruction quality using software, our system with HAQ, and our system with PTQ. j, The comparison of CT reconstruction energy efficiency of GPU, NPU, and our system. k, The comparison of CT reconstruction area efficiency of GPU, NPU, and our system.
  • Figure 5: Novel view synthesis.a, Schematic of novel view synthesis flow. b, The impact of different encoding methods on novel view synthesis qualities. c, The resistive memory conductance distribution of random Gaussian encoding matrix in GE, along with the input layer, colour output layer, and density output layer of the model mapped onto MLP PE through HAQ. d, The reduction of parameter size using our proposed low-rank decomposition and structured pruning. e, The comparison of synthesized novel views using our system, together with ground truth. f, The quantitative novel view synthesis results using our co-design, comparable to software baseline. g, The comparison of novel view synthesis energy efficiency of GPU, NPU, and our system. h, The comparison of novel view synthesis area efficiency of GPU, NPU, and our system.
  • ...and 1 more figures