Table of Contents
Fetching ...

Enhanced Neuromorphic Semantic Segmentation Latency through Stream Event

D. Hareb, J. Martinet, B. Miramond

TL;DR

This work addresses real-time semantic segmentation for resource-constrained environments by leveraging event streams from neuromorphic cameras and a Spiking Neural Network. A dynamic, region-based strategy compares the mean event count between successive frames to a threshold $\theta$ and selectively processes regions with significant changes using a lightweight SegSNNnet, while reusing the keyframe segmentation for smoother regions. The SegSNNnet backbone, featuring Spike-Element-Wise blocks and LIF neurons trained with surrogate gradients, achieves low energy consumption and is suited for neuromorphic hardware such as Loihi and SPLEAT. On the DSEC-semantic dataset, the method delivers substantial throughput gains (up to $5\times$–$10\times$ FPS) with modest MIoU losses of about $2$–$3\%$, demonstrating a practical balance between latency, accuracy, and energy efficiency for dynamic, embedded perception tasks.

Abstract

Achieving optimal semantic segmentation with frame-based vision sensors poses significant challenges for real-time systems like UAVs and self-driving cars, which require rapid and precise processing. Traditional frame-based methods often struggle to balance latency, accuracy, and energy efficiency. To address these challenges, we leverage event streams from event-based cameras-bio-inspired sensors that trigger events in response to changes in the scene. Specifically, we analyze the number of events triggered between successive frames, with a high number indicating significant changes and a low number indicating minimal changes. We exploit this event information to solve the semantic segmentation task by employing a Spiking Neural Network (SNN), a bio-inspired computing paradigm known for its low energy consumption. Our experiments on the DSEC dataset show that our approach significantly reduces latency with only a limited drop in accuracy. Additionally, by using SNNs, we achieve low power consumption, making our method suitable for energy-constrained real-time applications. To the best of our knowledge, our approach is the first to effectively balance reduced latency, minimal accuracy loss, and energy efficiency using events stream to enhance semantic segmentation in dynamic and resource-limited environments.

Enhanced Neuromorphic Semantic Segmentation Latency through Stream Event

TL;DR

This work addresses real-time semantic segmentation for resource-constrained environments by leveraging event streams from neuromorphic cameras and a Spiking Neural Network. A dynamic, region-based strategy compares the mean event count between successive frames to a threshold and selectively processes regions with significant changes using a lightweight SegSNNnet, while reusing the keyframe segmentation for smoother regions. The SegSNNnet backbone, featuring Spike-Element-Wise blocks and LIF neurons trained with surrogate gradients, achieves low energy consumption and is suited for neuromorphic hardware such as Loihi and SPLEAT. On the DSEC-semantic dataset, the method delivers substantial throughput gains (up to FPS) with modest MIoU losses of about , demonstrating a practical balance between latency, accuracy, and energy efficiency for dynamic, embedded perception tasks.

Abstract

Achieving optimal semantic segmentation with frame-based vision sensors poses significant challenges for real-time systems like UAVs and self-driving cars, which require rapid and precise processing. Traditional frame-based methods often struggle to balance latency, accuracy, and energy efficiency. To address these challenges, we leverage event streams from event-based cameras-bio-inspired sensors that trigger events in response to changes in the scene. Specifically, we analyze the number of events triggered between successive frames, with a high number indicating significant changes and a low number indicating minimal changes. We exploit this event information to solve the semantic segmentation task by employing a Spiking Neural Network (SNN), a bio-inspired computing paradigm known for its low energy consumption. Our experiments on the DSEC dataset show that our approach significantly reduces latency with only a limited drop in accuracy. Additionally, by using SNNs, we achieve low power consumption, making our method suitable for energy-constrained real-time applications. To the best of our knowledge, our approach is the first to effectively balance reduced latency, minimal accuracy loss, and energy efficiency using events stream to enhance semantic segmentation in dynamic and resource-limited environments.

Paper Structure

This paper contains 11 sections, 1 equation, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Network overview: The diagram illustrates the process in $3$ steps from left to right: 1) The image is divided into three overlapping regions, with the yellow, green, and red rectangles representing individual regions and their intersections representing the overlapping areas. 2) The mean number of events triggered within the time interval $[t-1, t]$ is compared against a predefined threshold $\theta$. 3) SegSNNnet processes the left region as its number of events exceeds the threshold. For the other two regions, the semantic segmentation results from the keyframe generated at time $t-1$ are transferred to the current frame for reuse.
  • Figure 2: Qualitative results on DSEC Dataset. From left to right, we visualize the RGB images, accumulated events over a $100ms$ interval before the RGB image was captured, ground truth, predictions using the baseline approach, predictions after dividing the image into $3$ regions and processing each with SegSNNnet, and predictions using our method. "CP" indicates the reuse of the keyframe's region by copy-pasting, and "SNN" refers to the processing of the current frame's region by SegSNNnet.