Coherent Video Inpainting Using Optical Flow-Guided Efficient Diffusion
Bohai Gu, Hao Luo, Song Guo, Peiran Dong, Qihua Zhou
TL;DR
FloED tackles the challenge of text-guided video inpainting by integrating optical flow as a motion prior into diffusion models. It introduces a dual-branch architecture with a time-agnostic flow completion branch and multi-scale flow adapters, augmented by an anchor-frame strategy and training-free latency reductions (latent interpolation and flow attention caching). Empirical results on background restoration and object removal show FloED achieving state-of-the-art quality and efficiency, with strong temporal coherence and text alignment. The approach offers practical impact by enabling faster, more coherent diffusion-based video inpainting and provides a public benchmark and code base for further research.
Abstract
The text-guided video inpainting technique has significantly improved the performance of content generation applications. A recent family for these improvements uses diffusion models, which have become essential for achieving high-quality video inpainting results, yet they still face performance bottlenecks in temporal consistency and computational efficiency. This motivates us to propose a new video inpainting framework using optical Flow-guided Efficient Diffusion (FloED) for higher video coherence. Specifically, FloED employs a dual-branch architecture, where the time-agnostic flow branch restores corrupted flow first, and the multi-scale flow adapters provide motion guidance to the main inpainting branch. Besides, a training-free latent interpolation method is proposed to accelerate the multi-step denoising process using flow warping. With the flow attention cache mechanism, FLoED efficiently reduces the computational cost of incorporating optical flow. Extensive experiments on background restoration and object removal tasks show that FloED outperforms state-of-the-art diffusion-based methods in both quality and efficiency. Our codes and models will be made publicly available.
