EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang; Haoran Chen; Haoyu Zhao; Guansong Lu; Yanwei Fu; Hang Xu; Zuxuan Wu

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang, Haoran Chen, Haoyu Zhao, Guansong Lu, Yanwei Fu, Hang Xu, Zuxuan Wu

TL;DR

EDEN tackles large-motion video frame interpolation by enhancing diffusion-based VFI through a transformer-based latent tokenizer and a diffusion transformer with temporal attention and start-end frame difference conditioning. It introduces a Pyramid Feature Fusion Module and multi-resolution/multi-frame interval fine-tuning to handle motion and resolution variability, and employs dual-stream context integration to better incorporate start and end frame information. The approach achieves state-of-the-art perceptual metrics on DAVIS, DAIN-HD, and SNU-FILM benchmarks, while maintaining efficiency with a minimal number of denoising steps. These advances demonstrate the potential of diffusion-based VFI to handle complex, real-world motion with improved temporal coherence and visual quality.

Abstract

Handling complex or nonlinear motion patterns has long posed challenges for video frame interpolation. Although recent advances in diffusion-based methods offer improvements over traditional optical flow-based approaches, they still struggle to generate sharp, temporally consistent frames in scenarios with large motion. To address this limitation, we introduce EDEN, an Enhanced Diffusion for high-quality large-motion vidEo frame iNterpolation. Our approach first utilizes a transformer-based tokenizer to produce refined latent representations of the intermediate frames for diffusion models. We then enhance the diffusion transformer with temporal attention across the process and incorporate a start-end frame difference embedding to guide the generation of dynamic motion. Extensive experiments demonstrate that EDEN achieves state-of-the-art results across popular benchmarks, including nearly a 10% LPIPS reduction on DAVIS and SNU-FILM, and an 8% improvement on DAIN-HD.

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

TL;DR

Abstract

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)