Table of Contents
Fetching ...

FPCA: Field-Programmable Pixel Convolutional Array for Extreme-Edge Intelligence

Zihan Yin, Akhilesh Jaiswal

TL;DR

The paper tackles energy and bandwidth bottlenecks in vision sensors by introducing FPCA, a field-programmable pixel array that brings in-pixel convolution with a dual-die 3D architecture. It achieves dynamic reconfigurability of weight values, kernel sizes, channel counts, and strides, plus region skipping, by placing the weight storage on a separate die and connecting via TSV or Cu-Cu bonding. A novel bucket-select curvefit model is proposed to accurately capture the nonlinearity of analog convolution and to enable ML-friendly training. Simulation results on a 28nm process demonstrate dot-product capabilities, adjustable kernel/channel/stride configurations, and favorable energy/latency/bandwidth tradeoffs, highlighting the potential for extreme-edge CV tasks with high density and adaptability.

Abstract

The rapid advancement of neural network applications necessitates hardware that not only accelerates computation but also adapts efficiently to dynamic processing requirements. While processing-in-pixel has emerged as a promising solution to overcome the bottlenecks of traditional architectures at the extreme-edge, existing implementations face limitations in reconfigurability and scalability due to their static nature and inefficient area usage. Addressing these challenges, we present a novel architecture that significantly enhances the capabilities of processing-in-pixel for convolutional neural networks. Our design innovatively integrates non-volatile memory (NVM) with novel unit pixel circuit design, enabling dynamic reconfiguration of synaptic weights, kernel size, channel size and stride size. Thus offering unprecedented flexibility and adaptability. With using a separate die for pixel circuit and storing synaptic weights, our circuit achieves a substantial reduction in the required area per pixel thereby increasing the density and scalability of the pixel array. Simulation results demonstrate dot product operations of the circuit, the non-linearity of its analog output and a novel bucket-select curvefit model is proposed to capture it. This work not only addresses the limitations of current in-pixel computing approaches but also opens new avenues for developing more efficient, flexible, and scalable neural network hardware, paving the way for advanced AI applications.

FPCA: Field-Programmable Pixel Convolutional Array for Extreme-Edge Intelligence

TL;DR

The paper tackles energy and bandwidth bottlenecks in vision sensors by introducing FPCA, a field-programmable pixel array that brings in-pixel convolution with a dual-die 3D architecture. It achieves dynamic reconfigurability of weight values, kernel sizes, channel counts, and strides, plus region skipping, by placing the weight storage on a separate die and connecting via TSV or Cu-Cu bonding. A novel bucket-select curvefit model is proposed to accurately capture the nonlinearity of analog convolution and to enable ML-friendly training. Simulation results on a 28nm process demonstrate dot-product capabilities, adjustable kernel/channel/stride configurations, and favorable energy/latency/bandwidth tradeoffs, highlighting the potential for extreme-edge CV tasks with high density and adaptability.

Abstract

The rapid advancement of neural network applications necessitates hardware that not only accelerates computation but also adapts efficiently to dynamic processing requirements. While processing-in-pixel has emerged as a promising solution to overcome the bottlenecks of traditional architectures at the extreme-edge, existing implementations face limitations in reconfigurability and scalability due to their static nature and inefficient area usage. Addressing these challenges, we present a novel architecture that significantly enhances the capabilities of processing-in-pixel for convolutional neural networks. Our design innovatively integrates non-volatile memory (NVM) with novel unit pixel circuit design, enabling dynamic reconfiguration of synaptic weights, kernel size, channel size and stride size. Thus offering unprecedented flexibility and adaptability. With using a separate die for pixel circuit and storing synaptic weights, our circuit achieves a substantial reduction in the required area per pixel thereby increasing the density and scalability of the pixel array. Simulation results demonstrate dot product operations of the circuit, the non-linearity of its analog output and a novel bucket-select curvefit model is proposed to capture it. This work not only addresses the limitations of current in-pixel computing approaches but also opens new avenues for developing more efficient, flexible, and scalable neural network hardware, paving the way for advanced AI applications.
Paper Structure (19 sections, 5 equations, 9 figures)

This paper contains 19 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: Proposed circuit and the overall architecture of the FPCA, where (a) is the novel 4T APS schematic, (b) is the switch matrix that connects the pixel array to the shared weight block in a 3D integrated weight die, (c) is the example diagram of shared weight block, (d) is peripheral ADC and (e) is the connection between the two dies using either Through-Silicon Vias (TSV) or Copper-Copper bonding (Cu-Cu).
  • Figure 2: Detailed Multi-channel Weight Block (shared weight bock) Schematic with positive and negative kernel. The top part of the figure is the representation of the proposed method to store the kernel weight using one positive and one negative kernel, and the bottom part of the figure is the schematic representation of the two cycles of the multi-channel weight block to show the activated part of the circuit.
  • Figure 3: Detail architecture of the column design of FPCA pixel array and Multi-Channel Wight Block where (a) is pixel column design, (b) shows the connection pattern to different columns of the multi-channel weight block and the control signal $ColP$ (column pattern select line), (c) is example figure of a m-channel with max $3\times 3$ kernel and (d) is the pixel circuit design where the SW line is the column control line and RS is the row control line and the input to the pixel: line SM is connected to one node of the switch matrix.
  • Figure 4: Reconfigurability in weight value, kernel size and channel size, the center of the figure is representing m channels of $k\times k$ kernels, the left part shows reconfigured weights in $i$ channels of $k\times k$ kernels and the right part of the figure shows the reconfigured $j$ channels for smaller $3\times 3$ kernels.
  • Figure 5: Figure representing vertical and horizontal striding of size $s = 1$ within the FPCA array.
  • ...and 4 more figures