Hierarchical Flow Diffusion for Efficient Frame Interpolation
Yang Hai, Guo Wang, Tan Su, Wenjie Jiang, Yinlin Hu
TL;DR
This work addresses the accuracy and efficiency gap of diffusion-based video frame interpolation by introducing a hierarchical flow diffusion framework that explicitly denoises optical flow in a coarse-to-fine, multi-scale manner. A flow-guided image synthesizer, trained with pseudo bilateral flow from a pretrained model, generates the intermediate frame, while a jointly trained hierarchical diffusion model refines the flow conditioned on encoder features. The approach yields state-of-the-art interpolation quality and over 10x faster inference than prior diffusion-based methods, with competitive memory usage, enabling practical high-resolution interpolations. The combination of explicit flow modeling, multiscale conditioning, and end-to-end fine-tuning offers a scalable and effective solution for handling large motions and complex scenes.
Abstract
Most recent diffusion-based methods still show a large gap compared to non-diffusion methods for video frame interpolation, in both accuracy and efficiency. Most of them formulate the problem as a denoising procedure in latent space directly, which is less effective caused by the large latent space. We propose to model bilateral optical flow explicitly by hierarchical diffusion models, which has much smaller search space in the denoising procedure. Based on the flow diffusion model, we then use a flow-guided images synthesizer to produce the final result. We train the flow diffusion model and the image synthesizer end to end. Our method achieves state of the art in accuracy, and 10+ times faster than other diffusion-based methods. The project page is at: https://hfd-interpolation.github.io.
