Table of Contents
Fetching ...

Learning Normal Flow Directly From Event Neighborhoods

Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller

TL;DR

The paper tackles the challenge of robust, transferrable normal flow estimation from event cameras by introducing a supervised, point-based approach that operates on local event neighborhoods. It leverages a VecKM-based local events encoder to produce per-event normal flow predictions, trained with a two-term loss that combines magnitude enforcement and directional alignment, and enhanced by data augmentations and ensemble-based uncertainty quantification. An egomotion solver, built on top of the normal flow and IMU measurements, reframes depth positivity into a maximum-margin problem for robust translation estimation. Across MVSEC, EVIMO2, and DSEC, the method demonstrates strong cross-domain performance, sharp predictions, and resilience to independently moving objects, with practical transferability achieved by training in normalized camera coordinates. The work provides a cohesive pipeline from local event geometry to reliable motion and ego-motion estimation, and points to future avenues such as self-supervised training and hybrid global-local modeling.

Abstract

Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors. In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios.

Learning Normal Flow Directly From Event Neighborhoods

TL;DR

The paper tackles the challenge of robust, transferrable normal flow estimation from event cameras by introducing a supervised, point-based approach that operates on local event neighborhoods. It leverages a VecKM-based local events encoder to produce per-event normal flow predictions, trained with a two-term loss that combines magnitude enforcement and directional alignment, and enhanced by data augmentations and ensemble-based uncertainty quantification. An egomotion solver, built on top of the normal flow and IMU measurements, reframes depth positivity into a maximum-margin problem for robust translation estimation. Across MVSEC, EVIMO2, and DSEC, the method demonstrates strong cross-domain performance, sharp predictions, and resilience to independently moving objects, with practical transferability achieved by training in normalized camera coordinates. The work provides a cohesive pipeline from local event geometry to reliable motion and ego-motion estimation, and points to future avenues such as self-supervised training and hybrid global-local modeling.

Abstract

Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors. In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios.

Paper Structure

This paper contains 28 sections, 16 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: We propose a point-based network for estimating normal flow from raw event data. We discover multiple key advantages of this point-based approach compared with existing learning-based approaches. An event and its neighborhood are first encoded as a fixed-dimensional vector, which is then input to a network trained in a supervised way to predict normal flow. This approach achieves high accuracy while maintaining strong transferability across different domains and datasets. Besides, we demonstrate the usefulness of the estimated normal flow in a new egomotion solver that is shown to remain robust even during aggressive camera motions.
  • Figure 2: Our point-based method produces accurate and sharp predictions in the presence of independently moving objects, while other methods paredes2023tamingshiba2022secrets fail. All models (if learning-based) are trained on DSEC and evaluated on EVIMO2. The flows are displayed in HSV color space, where the hue represents the flow direction, and the brightness represents the flow magnitude.
  • Figure 3: Uncertainty quantification (UQ) is important for per-event normal flow estimation, as it helps filter out less reliable predictions. For example, the normal flow predictions in Cases 1 and 2 are more reliable compared to those in Cases 3 and 4.
  • Figure 4: Reconstruction of a density distribution (shown in gray) from VecKM's local events encoding. The reconstructed 3D distribution closely aligns with the original (blue and red) events, demonstrating that VecKM's encoding effectively represents the event data. The examples shown are identical to those in Figure \ref{['fig:important_uq']}.
  • Figure 5: Loss maps and gradient fields of the motion field loss function. Our motion field loss function consists of radial and angular components. Given the GT optical flow $\mathbf{u}$, the radial component guides the predicted flow to lie on the circle with $\mathbf{u}$ as the diameter. The angular component guides the predicted flow to align with $\mathbf{u}$, which prevents the trivial prediction of zero flow.
  • ...and 5 more figures