Learning Normal Flow Directly From Event Neighborhoods
Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller
TL;DR
The paper tackles the challenge of robust, transferrable normal flow estimation from event cameras by introducing a supervised, point-based approach that operates on local event neighborhoods. It leverages a VecKM-based local events encoder to produce per-event normal flow predictions, trained with a two-term loss that combines magnitude enforcement and directional alignment, and enhanced by data augmentations and ensemble-based uncertainty quantification. An egomotion solver, built on top of the normal flow and IMU measurements, reframes depth positivity into a maximum-margin problem for robust translation estimation. Across MVSEC, EVIMO2, and DSEC, the method demonstrates strong cross-domain performance, sharp predictions, and resilience to independently moving objects, with practical transferability achieved by training in normalized camera coordinates. The work provides a cohesive pipeline from local event geometry to reliable motion and ego-motion estimation, and points to future avenues such as self-supervised training and hybrid global-local modeling.
Abstract
Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors. In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios.
