TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision
Cina Arjmand, Yingfu Xu, Kevin Shidqi, Alexandra F. Dobrita, Kanishkan Vadivel, Paul Detterer, Manolis Sifalakis, Amirreza Yousefzadeh, Guangzhi Tang
TL;DR
TRIP introduces a trainable ROI prediction framework for hardware-efficient event-based vision on the SENECA neuromorphic processor. It combines end-to-end differentiable ROI generation using a $N\times N$ grid of differentiable truncated Gaussian kernels with a hardware-aware ROI-generation replacement (dynamic average pooling) and sparsity/quantization-aware training to minimize on-chip computation. Across DvsGesture, Marshalling Signals, and synthetic N-MNIST, TRIP achieves state-of-the-art or near-state-of-the-art accuracy while delivering substantial efficiency gains (e.g., 46× lower computation on DvsGesture and 2×–x improvements in latency/energy on SENECA). The hardware deployment demonstrates scalable, low-power high-resolution event-based processing, highlighting potential for near-sensor ROI acceleration in edge sensing systems.
Abstract
Neuromorphic processors are well-suited for efficiently handling sparse events from event-based cameras. However, they face significant challenges in the growth of computing demand and hardware costs as the input resolution increases. This paper proposes the Trainable Region-of-Interest Prediction (TRIP), the first hardware-efficient hard attention framework for event-based vision processing on a neuromorphic processor. Our TRIP framework actively produces low-resolution Region-of-Interest (ROIs) for efficient and accurate classification. The framework exploits sparse events' inherent low information density to reduce the overhead of ROI prediction. We introduced extensive hardware-aware optimizations for TRIP and implemented the hardware-optimized algorithm on the SENECA neuromorphic processor. We utilized multiple event-based classification datasets for evaluation. Our approach achieves state-of-the-art accuracies in all datasets and produces reasonable ROIs with varying locations and sizes. On the DvsGesture dataset, our solution requires 46x less computation than the state-of-the-art while achieving higher accuracy. Furthermore, TRIP enables more than 2x latency and energy improvements on the SENECA neuromorphic processor compared to the conventional solution.
