Table of Contents
Fetching ...

TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision

Cina Arjmand, Yingfu Xu, Kevin Shidqi, Alexandra F. Dobrita, Kanishkan Vadivel, Paul Detterer, Manolis Sifalakis, Amirreza Yousefzadeh, Guangzhi Tang

TL;DR

TRIP introduces a trainable ROI prediction framework for hardware-efficient event-based vision on the SENECA neuromorphic processor. It combines end-to-end differentiable ROI generation using a $N\times N$ grid of differentiable truncated Gaussian kernels with a hardware-aware ROI-generation replacement (dynamic average pooling) and sparsity/quantization-aware training to minimize on-chip computation. Across DvsGesture, Marshalling Signals, and synthetic N-MNIST, TRIP achieves state-of-the-art or near-state-of-the-art accuracy while delivering substantial efficiency gains (e.g., 46× lower computation on DvsGesture and 2×–x improvements in latency/energy on SENECA). The hardware deployment demonstrates scalable, low-power high-resolution event-based processing, highlighting potential for near-sensor ROI acceleration in edge sensing systems.

Abstract

Neuromorphic processors are well-suited for efficiently handling sparse events from event-based cameras. However, they face significant challenges in the growth of computing demand and hardware costs as the input resolution increases. This paper proposes the Trainable Region-of-Interest Prediction (TRIP), the first hardware-efficient hard attention framework for event-based vision processing on a neuromorphic processor. Our TRIP framework actively produces low-resolution Region-of-Interest (ROIs) for efficient and accurate classification. The framework exploits sparse events' inherent low information density to reduce the overhead of ROI prediction. We introduced extensive hardware-aware optimizations for TRIP and implemented the hardware-optimized algorithm on the SENECA neuromorphic processor. We utilized multiple event-based classification datasets for evaluation. Our approach achieves state-of-the-art accuracies in all datasets and produces reasonable ROIs with varying locations and sizes. On the DvsGesture dataset, our solution requires 46x less computation than the state-of-the-art while achieving higher accuracy. Furthermore, TRIP enables more than 2x latency and energy improvements on the SENECA neuromorphic processor compared to the conventional solution.

TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision

TL;DR

TRIP introduces a trainable ROI prediction framework for hardware-efficient event-based vision on the SENECA neuromorphic processor. It combines end-to-end differentiable ROI generation using a grid of differentiable truncated Gaussian kernels with a hardware-aware ROI-generation replacement (dynamic average pooling) and sparsity/quantization-aware training to minimize on-chip computation. Across DvsGesture, Marshalling Signals, and synthetic N-MNIST, TRIP achieves state-of-the-art or near-state-of-the-art accuracy while delivering substantial efficiency gains (e.g., 46× lower computation on DvsGesture and 2×–x improvements in latency/energy on SENECA). The hardware deployment demonstrates scalable, low-power high-resolution event-based processing, highlighting potential for near-sensor ROI acceleration in edge sensing systems.

Abstract

Neuromorphic processors are well-suited for efficiently handling sparse events from event-based cameras. However, they face significant challenges in the growth of computing demand and hardware costs as the input resolution increases. This paper proposes the Trainable Region-of-Interest Prediction (TRIP), the first hardware-efficient hard attention framework for event-based vision processing on a neuromorphic processor. Our TRIP framework actively produces low-resolution Region-of-Interest (ROIs) for efficient and accurate classification. The framework exploits sparse events' inherent low information density to reduce the overhead of ROI prediction. We introduced extensive hardware-aware optimizations for TRIP and implemented the hardware-optimized algorithm on the SENECA neuromorphic processor. We utilized multiple event-based classification datasets for evaluation. Our approach achieves state-of-the-art accuracies in all datasets and produces reasonable ROIs with varying locations and sizes. On the DvsGesture dataset, our solution requires 46x less computation than the state-of-the-art while achieving higher accuracy. Furthermore, TRIP enables more than 2x latency and energy improvements on the SENECA neuromorphic processor compared to the conventional solution.
Paper Structure (42 sections, 9 equations, 10 figures, 17 tables)

This paper contains 42 sections, 9 equations, 10 figures, 17 tables.

Figures (10)

  • Figure 1: Overview of TRIP performing event-based vision classification on the SENECA neuromorphic processor.
  • Figure 2: Processing pipeline of TRIP for the event-based gesture recognition task. The downsampled events are fed into the ROI prediction event-based CNN to predict the ROI parameters. The ROI generation module uses the parameters to create the ROI fed into the classification event-based CNN. $H_t$ is the output of the ReLU recurrent unit, and $P_t$ is the output for processing events in timebin $t$.
  • Figure 3: Visualization of ROI's receptive fields for different gestures in the DvsGesture dataset. The receptive fields include the pixels involved in the ROI generation. They have superimposed on top of the timebinned event streams as a yellow rectangle.
  • Figure 4: Visualization of ROI's receptive fields for gestures performed at different distances in the Marshalling Signals dataset.
  • Figure 5: Example of synthetic N-MNIST samples (from left to right: digit 7, 3, and 0), showing ROI generated by network.
  • ...and 5 more figures