Table of Contents
Fetching ...

EvSegSNN: Neuromorphic Semantic Segmentation for Event Data

Dalia Hareb, Jean Martinet

TL;DR

EvSegSNN is designed, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance.

Abstract

Semantic segmentation is an important computer vision task, particularly for scene understanding and navigation of autonomous vehicles and UAVs. Several variations of deep neural network architectures have been designed to tackle this task. However, due to their huge computational costs and their high memory consumption, these models are not meant to be deployed on resource-constrained systems. To address this limitation, we introduce an end-to-end biologically inspired semantic segmentation approach by combining Spiking Neural Networks (SNNs, a low-power alternative to classical neural networks) with event cameras whose output data can directly feed these neural network inputs. We have designed EvSegSNN, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance. The experiments conducted on DDD17 demonstrate that EvSegSNN outperforms the closest state-of-the-art model in terms of MIoU while reducing the number of parameters by a factor of $1.6$ and sparing a batch normalization stage.

EvSegSNN: Neuromorphic Semantic Segmentation for Event Data

TL;DR

EvSegSNN is designed, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance.

Abstract

Semantic segmentation is an important computer vision task, particularly for scene understanding and navigation of autonomous vehicles and UAVs. Several variations of deep neural network architectures have been designed to tackle this task. However, due to their huge computational costs and their high memory consumption, these models are not meant to be deployed on resource-constrained systems. To address this limitation, we introduce an end-to-end biologically inspired semantic segmentation approach by combining Spiking Neural Networks (SNNs, a low-power alternative to classical neural networks) with event cameras whose output data can directly feed these neural network inputs. We have designed EvSegSNN, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance. The experiments conducted on DDD17 demonstrate that EvSegSNN outperforms the closest state-of-the-art model in terms of MIoU while reducing the number of parameters by a factor of and sparing a batch normalization stage.
Paper Structure (11 sections, 11 equations, 7 figures, 3 tables)

This paper contains 11 sections, 11 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: LIF neuron (Image taken from lee2020enabling).
  • Figure 2: Semantic segmentation results (left) of accumulated events within an interval of $50ms$ (middle). Grayscale images represent the scenes captured by the event camera (right). Image from alonso2019ev.
  • Figure 3: Computational graph of both encoder decoder parts unrolled over multiple timesteps. Adapted from kim2022beyond.
  • Figure 4: A: The proposed EvSegSNN was obtained after reducing the Unet size and its depth by one level corresponding to the layers within the pink box. B: The original Unet model ronneberger2015u. The pink and yellow boxes correspond to 2 depth levels reduced to obtain 2 light Unet models.
  • Figure 5: MIoU and accuracy of EvSegSNN and the baseline (Kim et al. kim2022beyond) with and without BNTT w.r.t. the number of parameters.
  • ...and 2 more figures