A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild
Lei Sun, Daniel Gehrig, Christos Sakaridis, Mathias Gehrig, Jingyun Liang, Peng Sun, Zhijie Xu, Kaiwei Wang, Luc Van Gool, Davide Scaramuzza
TL;DR
This work tackles robust event-based video frame interpolation under motion blur by developing REFID, a unified bidirectional recurrent network that performs ad-hoc deblurring to interpolate frames from sharp or blurry inputs using both frames and asynchronous events. The model fuses image and event features via a bidirectional event recurrent encoder and an Event-Guided Adaptive Channel Attention (EGACA) module, enabling accurate interpolation and deblurring in a single stage. To bridge synthetic-to-real gaps, the authors introduce a self-supervised fine-tuning framework with three losses (brightness increment, blur consistency, warp) and validate on a new HighREV dataset with high-resolution aligned events and RGB frames. Experiments show REFID achieves state-of-the-art performance on sharp and blurry frame interpolation and single-image deblurring, with strong generalization to real-world data thanks to SSL, highlighting practical impact for real-world event-based imaging systems.
Abstract
Effective video frame interpolation hinges on the adept handling of motion in the input scene. Prior work acknowledges asynchronous event information for this, but often overlooks whether motion induces blur in the video, limiting its scope to sharp frame interpolation. We instead propose a unified framework for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. Our model consists in a bidirectional recurrent network that incorporates the temporal dimension of interpolation and fuses information from the input frames and the events adaptively based on their temporal proximity. To enhance the generalization from synthetic data to real event cameras, we integrate self-supervised framework with the proposed model to enhance the generalization on real-world datasets in the wild. At the dataset level, we introduce a novel real-world high-resolution dataset with events and color videos named HighREV, which provides a challenging evaluation setting for the examined task. Extensive experiments show that our network consistently outperforms previous state-of-the-art methods on frame interpolation, single image deblurring, and the joint task of both. Experiments on domain transfer reveal that self-supervised training effectively mitigates the performance degradation observed when transitioning from synthetic data to real-world data. Code and datasets are available at https://github.com/AHupuJR/REFID.
