Perturbed State Space Feature Encoders for Optical Flow with Event Cameras
Gokul Raju Govinda Raju, Nikola Zubić, Marco Cannici, Davide Scaramuzza
TL;DR
The paper tackles optical flow estimation from asynchronous event camera data, addressing the limited temporal and spatial reasoning of prior methods. It introduces Perturbed State Space Feature Encoders (P-SSE), which fuse a global receptive field with linear-time State Space Models, strengthened by a perturbation and diagonalization scheme (PTD) to stabilize dynamics. A VideoFlow-inspired multi-frame pipeline is built around E-TROF and E-MOP to propagate motion information across up to five consecutive instants, enabling richer temporal context. Empirical results on DSEC-Flow and MVSEC show notable improvements in End-Point Error (e.g., around $0.680$ EPE on DSEC-Flow and $11.86\%$ MVSEC gain) along with favorable efficiency, supporting practical deployment in dynamic, real-world scenarios. Overall, the work advances event-based optical flow by combining robust spatial encoding with long-range temporal reasoning, achieving state-of-the-art accuracy and efficiency.
Abstract
With their motion-responsive nature, event-based cameras offer significant advantages over traditional cameras for optical flow estimation. While deep learning has improved upon traditional methods, current neural networks adopted for event-based optical flow still face temporal and spatial reasoning limitations. We propose Perturbed State Space Feature Encoders (P-SSE) for multi-frame optical flow with event cameras to address these challenges. P-SSE adaptively processes spatiotemporal features with a large receptive field akin to Transformer-based methods, while maintaining the linear computational complexity characteristic of SSMs. However, the key innovation that enables the state-of-the-art performance of our model lies in our perturbation technique applied to the state dynamics matrix governing the SSM system. This approach significantly improves the stability and performance of our model. We integrate P-SSE into a framework that leverages bi-directional flows and recurrent connections, expanding the temporal context of flow prediction. Evaluations on DSEC-Flow and MVSEC datasets showcase P-SSE's superiority, with 8.48% and 11.86% improvements in EPE performance, respectively.
