Event-based Video Frame Interpolation with Edge Guided Motion Refinement
Yuhan Liu, Yongjian Deng, Hao Chen, Bochen Xie, Youfu Li, Zhen Yang
TL;DR
This work tackles the challenge of edge-aware motion in event-based video frame interpolation by introducing EGMR, an end-to-end framework that leverages edge cues from events. The core innovations are the Edge Guided Attentive (EGA) module, comprising Cross-modal Local Attention and Cross-OF Attention, which refine multi-modal optical flows using edge-dominated event information, and an event-based visibility map used in warping refinement to address occlusions. Extensive experiments on synthetic and real datasets demonstrate that EGMR achieves superior interpolation quality, particularly in complex and edge-rich scenes, and ablation studies validate the contribution of each component. The proposed approach advances E-VFI by effectively fusing edge-focused event signals with frame data, enabling more reliable high-quality video interpolation in challenging conditions.
Abstract
Video frame interpolation, the process of synthesizing intermediate frames between sequential video frames, has made remarkable progress with the use of event cameras. These sensors, with microsecond-level temporal resolution, fill information gaps between frames by providing precise motion cues. However, contemporary Event-Based Video Frame Interpolation (E-VFI) techniques often neglect the fact that event data primarily supply high-confidence features at scene edges during multi-modal feature fusion, thereby diminishing the role of event signals in optical flow (OF) estimation and warping refinement. To address this overlooked aspect, we introduce an end-to-end E-VFI learning method (referred to as EGMR) to efficiently utilize edge features from event signals for motion flow and warping enhancement. Our method incorporates an Edge Guided Attentive (EGA) module, which rectifies estimated video motion through attentive aggregation based on the local correlation of multi-modal features in a coarse-to-fine strategy. Moreover, given that event data can provide accurate visual references at scene edges between consecutive frames, we introduce a learned visibility map derived from event data to adaptively mitigate the occlusion problem in the warping refinement process. Extensive experiments on both synthetic and real datasets show the effectiveness of the proposed approach, demonstrating its potential for higher quality video frame interpolation.
