Table of Contents
Fetching ...

Towards Anytime Optical Flow Estimation with Event Cameras

Yaozu Ye, Hao Shi, Kailun Yang, Ze Wang, Xiaoting Yin, Lei Sun, Yaonan Wang, Kaiwei Wang

TL;DR

This work tackles the challenge of producing time-dense optical flow from event cameras, where ground-truth flow is typically available only at low frame rates. It introduces EVA-Flow, an event-based framework that achieves ultra-low latency and high-frequency predictions by using a Unified Voxel Grid (UVG) for rapid, low-latency event encoding and a Spatiotemporal Motion Recurrent (SMR) module to refine flow across time and scales. A key contribution is the Rectified Flow Warp Loss (RFWL), an unsupervised metric that robustly evaluates intermediate, time-dense predictions. Across MVSEC, DSEC, and EVA-FlowSet, EVA-Flow demonstrates strong generalization and time-dense performance (5 ms latency and 200 Hz output) with competitive accuracy, enabling real-time motion perception on resource-constrained platforms.

Abstract

Event cameras respond to changes in log-brightness at the millisecond level, making them ideal for optical flow estimation. However, existing datasets from event cameras provide only low frame rate ground truth for optical flow, limiting the research potential of event-driven optical flow. To address this challenge, we introduce a low-latency event representation, Unified Voxel Grid, and propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. Furthermore, we propose the Rectified Flow Warp Loss (RFWL) for the unsupervised assessment of intermediate optical flow. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.

Towards Anytime Optical Flow Estimation with Event Cameras

TL;DR

This work tackles the challenge of producing time-dense optical flow from event cameras, where ground-truth flow is typically available only at low frame rates. It introduces EVA-Flow, an event-based framework that achieves ultra-low latency and high-frequency predictions by using a Unified Voxel Grid (UVG) for rapid, low-latency event encoding and a Spatiotemporal Motion Recurrent (SMR) module to refine flow across time and scales. A key contribution is the Rectified Flow Warp Loss (RFWL), an unsupervised metric that robustly evaluates intermediate, time-dense predictions. Across MVSEC, DSEC, and EVA-FlowSet, EVA-Flow demonstrates strong generalization and time-dense performance (5 ms latency and 200 Hz output) with competitive accuracy, enabling real-time motion perception on resource-constrained platforms.

Abstract

Event cameras respond to changes in log-brightness at the millisecond level, making them ideal for optical flow estimation. However, existing datasets from event cameras provide only low frame rate ground truth for optical flow, limiting the research potential of event-driven optical flow. To address this challenge, we introduce a low-latency event representation, Unified Voxel Grid, and propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. Furthermore, we propose the Rectified Flow Warp Loss (RFWL) for the unsupervised assessment of intermediate optical flow. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.
Paper Structure (9 sections, 5 equations, 4 figures)

This paper contains 9 sections, 5 equations, 4 figures.

Figures (4)

  • Figure S1: (a-b) EVA-Flow vs. previous event-based optical flow networks. (c) Comparison between our dense motion trajectory and other methods. (d) EVA-Flow's dense motion trajectory vs. E-RAFT gehrigERAFTDenseOptical2021 on the DSEC dataset gehrigDSECStereoEvent2021.
  • Figure S2: Architecture of our Event Anytime Flow Estimation (EVA-Flow) framework (a) and SMR module (b). $f^i$ denotes the features of the $i$-th level in a feature pyramid, while $f^i_j$ denotes the $j$-th event bin’s feature in $j^i$. $F_{0,i}$ denotes flow prediction from time $t_0$ to $t_i$. The red dashed box indicates the previous layer's SMR module output, while the purple box shows the current layer's output, destined for the next layer's SMR module.
  • Figure S3: The proposed Unified Voxel Grid compared with Voxel Grid zhuUnsupervisedEventBasedLearning2019. Different colored lines demonstrate the time range of events used for interpolation in different bins, as well as the interpolation weights corresponding to those events.
  • Figure S4: RFWL vs. FWL. The top row displays event count images from the original DSEC-Flow dataset gehrigDSECStereoEvent2021, with the second row presenting the corresponding Images of Warped Events (IWE). Variance and sum are indicated for each image. The bottom row shows the FWL and RFWL for the IWEs.