Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields
Taewoo Kim, Yujeong Chae, Hyun-Kurl Jang, Kuk-Jin Yoon
TL;DR
The paper tackles video frame interpolation by leveraging cross-modal information from event streams and RGB frames to handle complex, non-linear motions. It introduces EIF-BiOFNet to directly estimate asymmetric inter-frame motion fields $V_{t \rightarrow 0}$ and $V_{t \rightarrow 1}$ from both modalities and an Interactive Attention-based Frame Synthesis network to fuse warping-based and synthesis-based features for accurate $I_t$ reconstruction. It also introduces ERF-X170FPS, a high-frame-rate dataset captured with a beam-splitter rig to cover extreme motions and dynamic textures. Across synthetic and real benchmarks, the proposed method delivers state-of-the-art PSNR/SSIM gains (e.g., up to ~8.2dB PSNR on GoPro and ~7.9dB over TimeLens on ERF-X170FPS) with competitive model efficiency, demonstrating the value of cross-modal motion-field estimation and transformer-based frame synthesis for event-based VFI.
Abstract
Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames. Since the event cameras are bio-inspired sensors that only encode brightness changes with a micro-second temporal resolution, several works utilized the event camera to enhance the performance of VFI. However, existing methods estimate bidirectional inter-frame motion fields with only events or approximations, which can not consider the complex motion in real-world scenarios. In this paper, we propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation. In detail, our EIF-BiOFNet utilizes each valuable characteristic of the events and images for direct estimation of inter-frame motion fields without any approximation methods. Moreover, we develop an interactive attention-based frame synthesis network to efficiently leverage the complementary warping-based and synthesis-based features. Finally, we build a large-scale event-based VFI dataset, ERF-X170FPS, with a high frame rate, extreme motion, and dynamic textures to overcome the limitations of previous event-based VFI datasets. Extensive experimental results validate that our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets. Our project pages are available at: https://github.com/intelpro/CBMNet
