Table of Contents
Fetching ...

SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation

Yi Tian, Juan Andrade-Cetto

TL;DR

This work is the first to make use of spikeformers for dense optical flow estimation and yield state-of-the-art performance among SNN-based event optical flow methods on both the DSEC and MVSEC datasets, and show significant reduction in power consumption compared to the equivalent ANNs.

Abstract

Event cameras generate asynchronous and sparse event streams capturing changes in light intensity. They offer significant advantages over conventional frame-based cameras, such as a higher dynamic range and an extremely faster data rate, making them particularly useful in scenarios involving fast motion or challenging lighting conditions. Spiking neural networks (SNNs) share similar asynchronous and sparse characteristics and are well-suited for processing data from event cameras. Inspired by the potential of transformers and spike-driven transformers (spikeformers) in other computer vision tasks, we propose two solutions for fast and robust optical flow estimation for event cameras: STTFlowNet and SDformerFlow. STTFlowNet adopts a U-shaped artificial neural network (ANN) architecture with spatiotemporal shifted window self-attention (swin) transformer encoders, while SDformerFlow presents its fully spiking counterpart, incorporating swin spikeformer encoders. Furthermore, we present two variants of the spiking version with different neuron models. Our work is the first to make use of spikeformers for dense optical flow estimation. We conduct end-to-end training for all models using supervised learning. Our results yield state-of-the-art performance among SNN-based event optical flow methods on both the DSEC and MVSEC datasets, and show significant reduction in power consumption compared to the equivalent ANNs.

SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation

TL;DR

This work is the first to make use of spikeformers for dense optical flow estimation and yield state-of-the-art performance among SNN-based event optical flow methods on both the DSEC and MVSEC datasets, and show significant reduction in power consumption compared to the equivalent ANNs.

Abstract

Event cameras generate asynchronous and sparse event streams capturing changes in light intensity. They offer significant advantages over conventional frame-based cameras, such as a higher dynamic range and an extremely faster data rate, making them particularly useful in scenarios involving fast motion or challenging lighting conditions. Spiking neural networks (SNNs) share similar asynchronous and sparse characteristics and are well-suited for processing data from event cameras. Inspired by the potential of transformers and spike-driven transformers (spikeformers) in other computer vision tasks, we propose two solutions for fast and robust optical flow estimation for event cameras: STTFlowNet and SDformerFlow. STTFlowNet adopts a U-shaped artificial neural network (ANN) architecture with spatiotemporal shifted window self-attention (swin) transformer encoders, while SDformerFlow presents its fully spiking counterpart, incorporating swin spikeformer encoders. Furthermore, we present two variants of the spiking version with different neuron models. Our work is the first to make use of spikeformers for dense optical flow estimation. We conduct end-to-end training for all models using supervised learning. Our results yield state-of-the-art performance among SNN-based event optical flow methods on both the DSEC and MVSEC datasets, and show significant reduction in power consumption compared to the equivalent ANNs.
Paper Structure (30 sections, 13 equations, 9 figures, 5 tables)

This paper contains 30 sections, 13 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Event input representation for SDformerFlow.
  • Figure 2: SDformerFlowNet-v2 architecture.
  • Figure 3: Spatio-temporal swin transformer blocks in STTFlowNet.
  • Figure 4: Shortcuts in SNN. The left and middle blocks illustrate the vanilla and spike-element-wise (SEW) shortcuts commonly used in other architectures zhou2023spikformericlr. The right block shows the membrane potential (MS) shortcut used in our SDformerFlow.
  • Figure 5: Spike-driven self-attention (SDSA) block with spiking dot product attention utilized in SDformerFlow-v1.
  • ...and 4 more figures