Table of Contents
Fetching ...

ST-FlowNet: An Efficient Spiking Neural Network for Event-Based Optical Flow Estimation

Hongze Sun, Jun Wang, Wuque Cai, Duo Chen, Qianqian Liao, Jiayi He, Yan Cui, Dezhong Yao, Daqing Guo

TL;DR

ST-FlowNet introduces a ConvGRU-enhanced spiking architecture tailored for event-based optical flow, achieving state-of-the-art performance while benefiting from neuromorphic efficiency. It couples a semi-pyramidal encoder-decoder with ConvGRU-based spatio-temporal augmentation and alignment, and enables SNN deployment via ANN-to-SNN conversion and a parameter-free BISNN training method. The approach delivers superior accuracy on MVSEC, ECD, and HQF benchmarks and demonstrates substantial energy savings relative to frame-based ANN equivalents. Together, ST-FlowNet advances practical neuromorphic vision for robust, low-power optical flow estimation in dynamic real-world scenes.

Abstract

Spiking Neural Networks (SNNs) have emerged as a promising tool for event-based optical flow estimation tasks due to their ability to leverage spatio-temporal information and low-power capabilities. However, the performance of SNN models is often constrained, limiting their application in real-world scenarios. In this work, we address this gap by proposing a novel neural network architecture, ST-FlowNet, specifically tailored for optical flow estimation from event-based data. The ST-FlowNet architecture integrates ConvGRU modules to facilitate cross-modal feature augmentation and temporal alignment of the predicted optical flow, improving the network's ability to capture complex motion dynamics. Additionally, to overcome the challenges associated with training SNNs, we introduce a novel approach to derive SNN models from pre-trained artificial neural networks (ANNs) through ANN-to-SNN conversion or our proposed BISNN method. Notably, the BISNN method alleviates the complexities involved in biological parameter selection, further enhancing the robustness of SNNs in optical flow estimation tasks. Extensive evaluations on three benchmark event-based datasets demonstrate that the SNN-based ST-FlowNet model outperforms state-of-the-art methods, delivering superior performance in accurate optical flow estimation across a diverse range of dynamic visual scenes. Furthermore, the inherent energy efficiency of SNN models is highlighted, establishing a compelling advantage for their practical deployment. Overall, our work presents a novel framework for optical flow estimation using SNNs and event-based data, contributing to the advancement of neuromorphic vision applications.

ST-FlowNet: An Efficient Spiking Neural Network for Event-Based Optical Flow Estimation

TL;DR

ST-FlowNet introduces a ConvGRU-enhanced spiking architecture tailored for event-based optical flow, achieving state-of-the-art performance while benefiting from neuromorphic efficiency. It couples a semi-pyramidal encoder-decoder with ConvGRU-based spatio-temporal augmentation and alignment, and enables SNN deployment via ANN-to-SNN conversion and a parameter-free BISNN training method. The approach delivers superior accuracy on MVSEC, ECD, and HQF benchmarks and demonstrates substantial energy savings relative to frame-based ANN equivalents. Together, ST-FlowNet advances practical neuromorphic vision for robust, low-power optical flow estimation in dynamic real-world scenes.

Abstract

Spiking Neural Networks (SNNs) have emerged as a promising tool for event-based optical flow estimation tasks due to their ability to leverage spatio-temporal information and low-power capabilities. However, the performance of SNN models is often constrained, limiting their application in real-world scenarios. In this work, we address this gap by proposing a novel neural network architecture, ST-FlowNet, specifically tailored for optical flow estimation from event-based data. The ST-FlowNet architecture integrates ConvGRU modules to facilitate cross-modal feature augmentation and temporal alignment of the predicted optical flow, improving the network's ability to capture complex motion dynamics. Additionally, to overcome the challenges associated with training SNNs, we introduce a novel approach to derive SNN models from pre-trained artificial neural networks (ANNs) through ANN-to-SNN conversion or our proposed BISNN method. Notably, the BISNN method alleviates the complexities involved in biological parameter selection, further enhancing the robustness of SNNs in optical flow estimation tasks. Extensive evaluations on three benchmark event-based datasets demonstrate that the SNN-based ST-FlowNet model outperforms state-of-the-art methods, delivering superior performance in accurate optical flow estimation across a diverse range of dynamic visual scenes. Furthermore, the inherent energy efficiency of SNN models is highlighted, establishing a compelling advantage for their practical deployment. Overall, our work presents a novel framework for optical flow estimation using SNNs and event-based data, contributing to the advancement of neuromorphic vision applications.

Paper Structure

This paper contains 27 sections, 17 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The framework of our proposed optical flow estimation method is illustrated. The ST-FlowNet (both ANN and SNN) models utilize event-based images as input data. Following training on the ANN model, an SNN ST-FlowNet model is derived through the A2S conversion or BISNN method. Optical flow prediction is achievable using both ANN and SNN models. Additionally, for reference, the corresponding frame-based images are presented in the left black box. Conventional frame-based images exhibit abundant spatial texture information indiscriminately, while event-based images emphasize motion-related objects by leveraging spatio-temporal cues simultaneously.
  • Figure 2: (a) The ST-FlowNet architecture is illustrated. Following pre-processing by a ConvGRU layer, the enhanced event-based input undergoes downsampling via four encoder layers. The resulting minimal feature maps produced by encoder4 traverse two residual block layers, ensuring robust feature extraction. Through the concatenation of feature maps at various levels, decoder layers and a generator are deployed for basic optical flow prediction. Furthermore, the basic predicted optical flow is fed through a ConvGRU layer to fuse historical sequential temporal feature and generate the final predicted optical flow. (b) Schematic illustration depicting the architecture of ConvGRU. A ConvGRU unit integrates both current input and state information to produce a corresponding output. The symbol $\odot$ denotes the Hadamard product, and $\sigma$ is the activation function.
  • Figure 3: Visual comparison of ST-FlowNet with other models. Original frame-based images, ground truth of the MVSEC dataset, and the color coding of the optical flow are provided for reference. The AEE$_{1}$ (black) or FWL (red) results of each predicted optical flow are provided at the upper-left.
  • Figure 4: The performance comparison of SNN models initialized with different combinations of membrane potential decay factors. (a) SNN models trained using the A2S method. (b) SNN models trained using the BISNN method. The optimal results are highlighted within red boxes.
  • Figure 5: The energy consumption of SNN ST-FlowNet models relative to ANN models.
  • ...and 1 more figures