Table of Contents
Fetching ...

Event-based Video Frame Interpolation with Edge Guided Motion Refinement

Yuhan Liu, Yongjian Deng, Hao Chen, Bochen Xie, Youfu Li, Zhen Yang

TL;DR

This work tackles the challenge of edge-aware motion in event-based video frame interpolation by introducing EGMR, an end-to-end framework that leverages edge cues from events. The core innovations are the Edge Guided Attentive (EGA) module, comprising Cross-modal Local Attention and Cross-OF Attention, which refine multi-modal optical flows using edge-dominated event information, and an event-based visibility map used in warping refinement to address occlusions. Extensive experiments on synthetic and real datasets demonstrate that EGMR achieves superior interpolation quality, particularly in complex and edge-rich scenes, and ablation studies validate the contribution of each component. The proposed approach advances E-VFI by effectively fusing edge-focused event signals with frame data, enabling more reliable high-quality video interpolation in challenging conditions.

Abstract

Video frame interpolation, the process of synthesizing intermediate frames between sequential video frames, has made remarkable progress with the use of event cameras. These sensors, with microsecond-level temporal resolution, fill information gaps between frames by providing precise motion cues. However, contemporary Event-Based Video Frame Interpolation (E-VFI) techniques often neglect the fact that event data primarily supply high-confidence features at scene edges during multi-modal feature fusion, thereby diminishing the role of event signals in optical flow (OF) estimation and warping refinement. To address this overlooked aspect, we introduce an end-to-end E-VFI learning method (referred to as EGMR) to efficiently utilize edge features from event signals for motion flow and warping enhancement. Our method incorporates an Edge Guided Attentive (EGA) module, which rectifies estimated video motion through attentive aggregation based on the local correlation of multi-modal features in a coarse-to-fine strategy. Moreover, given that event data can provide accurate visual references at scene edges between consecutive frames, we introduce a learned visibility map derived from event data to adaptively mitigate the occlusion problem in the warping refinement process. Extensive experiments on both synthetic and real datasets show the effectiveness of the proposed approach, demonstrating its potential for higher quality video frame interpolation.

Event-based Video Frame Interpolation with Edge Guided Motion Refinement

TL;DR

This work tackles the challenge of edge-aware motion in event-based video frame interpolation by introducing EGMR, an end-to-end framework that leverages edge cues from events. The core innovations are the Edge Guided Attentive (EGA) module, comprising Cross-modal Local Attention and Cross-OF Attention, which refine multi-modal optical flows using edge-dominated event information, and an event-based visibility map used in warping refinement to address occlusions. Extensive experiments on synthetic and real datasets demonstrate that EGMR achieves superior interpolation quality, particularly in complex and edge-rich scenes, and ablation studies validate the contribution of each component. The proposed approach advances E-VFI by effectively fusing edge-focused event signals with frame data, enabling more reliable high-quality video interpolation in challenging conditions.

Abstract

Video frame interpolation, the process of synthesizing intermediate frames between sequential video frames, has made remarkable progress with the use of event cameras. These sensors, with microsecond-level temporal resolution, fill information gaps between frames by providing precise motion cues. However, contemporary Event-Based Video Frame Interpolation (E-VFI) techniques often neglect the fact that event data primarily supply high-confidence features at scene edges during multi-modal feature fusion, thereby diminishing the role of event signals in optical flow (OF) estimation and warping refinement. To address this overlooked aspect, we introduce an end-to-end E-VFI learning method (referred to as EGMR) to efficiently utilize edge features from event signals for motion flow and warping enhancement. Our method incorporates an Edge Guided Attentive (EGA) module, which rectifies estimated video motion through attentive aggregation based on the local correlation of multi-modal features in a coarse-to-fine strategy. Moreover, given that event data can provide accurate visual references at scene edges between consecutive frames, we introduce a learned visibility map derived from event data to adaptively mitigate the occlusion problem in the warping refinement process. Extensive experiments on both synthetic and real datasets show the effectiveness of the proposed approach, demonstrating its potential for higher quality video frame interpolation.
Paper Structure (34 sections, 14 equations, 11 figures, 4 tables)

This paper contains 34 sections, 14 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Visual comparison between state-of-the-art method UPR-Netjin2023unified and ours, where $\{I_0, I_1\}$ are keyframes and {$F_{\tau \to 0}, F_{\tau \to 1}$} are predicted bidirectional OFs. Due to the lack of real inter-frame information, traditional VFI work (UPR-Net) cannot accurately model inter-frame motions. Instead, our E-VFI approach can better encode the motion trajectory between consecutive frames.
  • Figure 2: Visual comparisons among representative E-VFI methods, the main concerns involve low luminance, irregular motion and textured scenes.
  • Figure 3: Pipeline overview of the proposed EGMR. Event OF {$F^e$} and event-based visibility map {$M^e$} are generated from Event Flow Net. Multi-scale frame OF {$F^{s} | s \in \{ 0,1,2 \}$} and image-based visibility map ($M^s$) are produced by IFBlocks. At different scales, the EGA is employed to refine predicted multi-modal OFs through emphasizing precise edge motion. Finally, cross-modal enhanced motions are used to generate the final interpolated frame ($I_{\tau}$) via warping and refinement.
  • Figure 4: The network architecture for extracting event and frame OFs, $K\in[4,2,1]$. The illustration is adapted from tulyakov2021timehuang2022real.
  • Figure 5: Our proposed Edge Guided Attentive module (EGA), left is cross-modal local attention module (CLA) and right is cross-OF attention module (COA). The output of CLA is fed into the module COA. Numbers represent the output channels.
  • ...and 6 more figures