Motion-Aware Generative Frame Interpolation
Guozhen Zhang, Yuhan Zhu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang
TL;DR
MoG tackles artifacts in flow-based frame interpolation caused by complex motion by integrating intermediate flow guidance with diffusion-based generative refinement. It introduces dual guidance injection at latent and feature levels to align motion with flow trajectories, paired with encoder-only guidance and selective parameter fine-tuning to dynamically correct flow errors. Extensive experiments on real-world and animation benchmarks show MoG substantially improves video quality and fidelity over both flow-based and prior generative VFI methods, while maintaining efficiency. This work bridges flow-based stability and generative flexibility, enabling robust frame interpolation across diverse scenarios.
Abstract
Flow-based frame interpolation methods ensure motion stability through estimated intermediate flow but often introduce severe artifacts in complex motion regions. Recent generative approaches, boosted by large-scale pre-trained video generation models, show promise in handling intricate scenes. However, they frequently produce unstable motion and content inconsistencies due to the absence of explicit motion trajectory constraints. To address these challenges, we propose Motion-aware Generative frame interpolation (MoG) that synergizes intermediate flow guidance with generative capacities to enhance interpolation fidelity. Our key insight is to simultaneously enforce motion smoothness through flow constraints while adaptively correcting flow estimation errors through generative refinement. Specifically, we first introduce a dual guidance injection that propagates condition information using intermediate flow at both latent and feature levels, aligning the generated motion with flow-derived motion trajectories. Meanwhile, we implemented two critical designs, encoder-only guidance injection and selective parameter fine-tuning, which enable dynamic artifact correction in the complex motion regions. Extensive experiments on both real-world and animation benchmarks demonstrate that MoG outperforms state-of-the-art methods in terms of video quality and visual fidelity. Our work bridges the gap between flow-based stability and generative flexibility, offering a versatile solution for frame interpolation across diverse scenarios.
