Table of Contents
Fetching ...

Perturbed State Space Feature Encoders for Optical Flow with Event Cameras

Gokul Raju Govinda Raju, Nikola Zubić, Marco Cannici, Davide Scaramuzza

TL;DR

The paper tackles optical flow estimation from asynchronous event camera data, addressing the limited temporal and spatial reasoning of prior methods. It introduces Perturbed State Space Feature Encoders (P-SSE), which fuse a global receptive field with linear-time State Space Models, strengthened by a perturbation and diagonalization scheme (PTD) to stabilize dynamics. A VideoFlow-inspired multi-frame pipeline is built around E-TROF and E-MOP to propagate motion information across up to five consecutive instants, enabling richer temporal context. Empirical results on DSEC-Flow and MVSEC show notable improvements in End-Point Error (e.g., around $0.680$ EPE on DSEC-Flow and $11.86\%$ MVSEC gain) along with favorable efficiency, supporting practical deployment in dynamic, real-world scenarios. Overall, the work advances event-based optical flow by combining robust spatial encoding with long-range temporal reasoning, achieving state-of-the-art accuracy and efficiency.

Abstract

With their motion-responsive nature, event-based cameras offer significant advantages over traditional cameras for optical flow estimation. While deep learning has improved upon traditional methods, current neural networks adopted for event-based optical flow still face temporal and spatial reasoning limitations. We propose Perturbed State Space Feature Encoders (P-SSE) for multi-frame optical flow with event cameras to address these challenges. P-SSE adaptively processes spatiotemporal features with a large receptive field akin to Transformer-based methods, while maintaining the linear computational complexity characteristic of SSMs. However, the key innovation that enables the state-of-the-art performance of our model lies in our perturbation technique applied to the state dynamics matrix governing the SSM system. This approach significantly improves the stability and performance of our model. We integrate P-SSE into a framework that leverages bi-directional flows and recurrent connections, expanding the temporal context of flow prediction. Evaluations on DSEC-Flow and MVSEC datasets showcase P-SSE's superiority, with 8.48% and 11.86% improvements in EPE performance, respectively.

Perturbed State Space Feature Encoders for Optical Flow with Event Cameras

TL;DR

The paper tackles optical flow estimation from asynchronous event camera data, addressing the limited temporal and spatial reasoning of prior methods. It introduces Perturbed State Space Feature Encoders (P-SSE), which fuse a global receptive field with linear-time State Space Models, strengthened by a perturbation and diagonalization scheme (PTD) to stabilize dynamics. A VideoFlow-inspired multi-frame pipeline is built around E-TROF and E-MOP to propagate motion information across up to five consecutive instants, enabling richer temporal context. Empirical results on DSEC-Flow and MVSEC show notable improvements in End-Point Error (e.g., around EPE on DSEC-Flow and MVSEC gain) along with favorable efficiency, supporting practical deployment in dynamic, real-world scenarios. Overall, the work advances event-based optical flow by combining robust spatial encoding with long-range temporal reasoning, achieving state-of-the-art accuracy and efficiency.

Abstract

With their motion-responsive nature, event-based cameras offer significant advantages over traditional cameras for optical flow estimation. While deep learning has improved upon traditional methods, current neural networks adopted for event-based optical flow still face temporal and spatial reasoning limitations. We propose Perturbed State Space Feature Encoders (P-SSE) for multi-frame optical flow with event cameras to address these challenges. P-SSE adaptively processes spatiotemporal features with a large receptive field akin to Transformer-based methods, while maintaining the linear computational complexity characteristic of SSMs. However, the key innovation that enables the state-of-the-art performance of our model lies in our perturbation technique applied to the state dynamics matrix governing the SSM system. This approach significantly improves the stability and performance of our model. We integrate P-SSE into a framework that leverages bi-directional flows and recurrent connections, expanding the temporal context of flow prediction. Evaluations on DSEC-Flow and MVSEC datasets showcase P-SSE's superiority, with 8.48% and 11.86% improvements in EPE performance, respectively.

Paper Structure

This paper contains 19 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustrative diagram of our E-TROF model. Utilizing three successive event representations, E-TROF employs bidirectional correlation, context, and flow features for recurrently estimating bidirectional optical flows.
  • Figure 2: Diagram of the E-MOP model, which integrates 3 E-TROFs for 5 consecutive event representations to predict and refine bidirectional optical flows by sharing dynamic temporal motion information among adjacent TROFs.
  • Figure 3: Illustration of P-SSE's efficacy in handling out-of-boundary regions, in comparison with E-RAFT Gehrig3dv2021.
  • Figure 4: Demonstration of P-SSE's capability in managing partially occluded scenes, compared with E-RAFT Gehrig3dv2021. The sequence progresses from left to right, showcasing frames from a DSEC-Flow test sequence.