FPCA: Field-Programmable Pixel Convolutional Array for Extreme-Edge Intelligence
Zihan Yin, Akhilesh Jaiswal
TL;DR
The paper tackles energy and bandwidth bottlenecks in vision sensors by introducing FPCA, a field-programmable pixel array that brings in-pixel convolution with a dual-die 3D architecture. It achieves dynamic reconfigurability of weight values, kernel sizes, channel counts, and strides, plus region skipping, by placing the weight storage on a separate die and connecting via TSV or Cu-Cu bonding. A novel bucket-select curvefit model is proposed to accurately capture the nonlinearity of analog convolution and to enable ML-friendly training. Simulation results on a 28nm process demonstrate dot-product capabilities, adjustable kernel/channel/stride configurations, and favorable energy/latency/bandwidth tradeoffs, highlighting the potential for extreme-edge CV tasks with high density and adaptability.
Abstract
The rapid advancement of neural network applications necessitates hardware that not only accelerates computation but also adapts efficiently to dynamic processing requirements. While processing-in-pixel has emerged as a promising solution to overcome the bottlenecks of traditional architectures at the extreme-edge, existing implementations face limitations in reconfigurability and scalability due to their static nature and inefficient area usage. Addressing these challenges, we present a novel architecture that significantly enhances the capabilities of processing-in-pixel for convolutional neural networks. Our design innovatively integrates non-volatile memory (NVM) with novel unit pixel circuit design, enabling dynamic reconfiguration of synaptic weights, kernel size, channel size and stride size. Thus offering unprecedented flexibility and adaptability. With using a separate die for pixel circuit and storing synaptic weights, our circuit achieves a substantial reduction in the required area per pixel thereby increasing the density and scalability of the pixel array. Simulation results demonstrate dot product operations of the circuit, the non-linearity of its analog output and a novel bucket-select curvefit model is proposed to capture it. This work not only addresses the limitations of current in-pixel computing approaches but also opens new avenues for developing more efficient, flexible, and scalable neural network hardware, paving the way for advanced AI applications.
