Table of Contents
Fetching ...

Region Masking to Accelerate Video Processing on Neuromorphic Hardware

Sreetama Sarkar, Sumit Bam Shrestha, Yue Che, Leobardo Campos-Macias, Gourav Datta, Peter A. Beerel

TL;DR

This work addresses energy and latency challenges of on-chip video inference with Spiking Neural Networks on neuromorphic hardware by introducing a region-masking strategy that prunes insignificant input regions. A static mask from training data and a dynamic mask from a lightweight MGNet transformer generate a union mask to drive masked, event-aware inference with sigma-delta encoding on Loihi 2. The key contributions are the static+dynamic region masking approach, the MGNet-based dynamic mask, and the demonstration that masking yields up to 1.65× EDP reduction (with ~60% region sparsity) while incurring modest mAP degradation (~1.09% on KITTI) across multiple datasets. This approach enables more energy-efficient, low-latency edge video processing with neuromorphic hardware and lays groundwork for further reductions in inter-chip communication bottlenecks and readout energy.

Abstract

The rapidly growing demand for on-chip edge intelligence on resource-constrained devices has motivated approaches to reduce energy and latency of deep learning models. Spiking neural networks (SNNs) have gained particular interest due to their promise to reduce energy consumption using event-based processing. We assert that while sigma-delta encoding in SNNs can take advantage of the temporal redundancy across video frames, they still involve a significant amount of redundant computations due to processing insignificant events. In this paper, we propose a region masking strategy that identifies regions of interest at the input of the SNN, thereby eliminating computation and data movement for events arising from unimportant regions. Our approach demonstrates that masking regions at the input not only significantly reduces the overall spiking activity of the network, but also provides significant improvement in throughput and latency. We apply region masking during video object detection on Loihi 2, demonstrating that masking approximately 60% of input regions can reduce energy-delay product by 1.65x over a baseline sigma-delta network, with a degradation in mAP@0.5 by 1.09%.

Region Masking to Accelerate Video Processing on Neuromorphic Hardware

TL;DR

This work addresses energy and latency challenges of on-chip video inference with Spiking Neural Networks on neuromorphic hardware by introducing a region-masking strategy that prunes insignificant input regions. A static mask from training data and a dynamic mask from a lightweight MGNet transformer generate a union mask to drive masked, event-aware inference with sigma-delta encoding on Loihi 2. The key contributions are the static+dynamic region masking approach, the MGNet-based dynamic mask, and the demonstration that masking yields up to 1.65× EDP reduction (with ~60% region sparsity) while incurring modest mAP degradation (~1.09% on KITTI) across multiple datasets. This approach enables more energy-efficient, low-latency edge video processing with neuromorphic hardware and lays groundwork for further reductions in inter-chip communication bottlenecks and readout energy.

Abstract

The rapidly growing demand for on-chip edge intelligence on resource-constrained devices has motivated approaches to reduce energy and latency of deep learning models. Spiking neural networks (SNNs) have gained particular interest due to their promise to reduce energy consumption using event-based processing. We assert that while sigma-delta encoding in SNNs can take advantage of the temporal redundancy across video frames, they still involve a significant amount of redundant computations due to processing insignificant events. In this paper, we propose a region masking strategy that identifies regions of interest at the input of the SNN, thereby eliminating computation and data movement for events arising from unimportant regions. Our approach demonstrates that masking regions at the input not only significantly reduces the overall spiking activity of the network, but also provides significant improvement in throughput and latency. We apply region masking during video object detection on Loihi 2, demonstrating that masking approximately 60% of input regions can reduce energy-delay product by 1.65x over a baseline sigma-delta network, with a degradation in mAP@0.5 by 1.09%.

Paper Structure

This paper contains 10 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Inference pipeline on Loihi 2 using region masking. The Region Masking module applies masking to the RGB frames generated by the Data Generator and transmits the masked frame to the Quantizer module for further processing.
  • Figure 2: Sparse compression of a temporally redundant signal using delta encoding and its corresponding reconstruction using sigma accumulation.
  • Figure 3: Layerwise event-rate for different input mask sparsity on ImageNet-VID using Tiny-YOLO. This demonstrates how inducing sparsity at the input propagates to the intermediate layers.
  • Figure 4: (From left) The input frame before masking, input frame after masking and the delta-encoded frames for static (first row), dynamic (second row) and combined (third row) masking. Dynamic masking shows a few additional patches in the delta-encoded frame (second row right image), which leads to increased computational cost.