Solving Video Inverse Problems Using Image Diffusion Models
Taesung Kwon, Jong Chul Ye
TL;DR
This work tackles video inverse problems under spatio-temporal degradation by leveraging only pre-trained image diffusion models. It reframes the temporal axis as a batch dimension and introduces batch-consistent sampling combined with Krylov-subspace optimization (DDS) to perform spatio-temporal refinement within Tweedie-denoised batches, without training video diffusion models. The method achieves state-of-the-art reconstructions on temporal and spatio-temporal degradations while offering VRAM-efficient, faster-than-before performance, including capabilities at low NFEs and near real-time speeds for short sequences. It also demonstrates extensibility to blind and other restoration settings, highlighting practical impact for video restoration tasks with limited training data or resources.
Abstract
Recently, diffusion model-based inverse problem solvers (DIS) have emerged as state-of-the-art approaches for addressing inverse problems, including image super-resolution, deblurring, inpainting, etc. However, their application to video inverse problems arising from spatio-temporal degradation remains largely unexplored due to the challenges in training video diffusion models. To address this issue, here we introduce an innovative video inverse solver that leverages only image diffusion models. Specifically, by drawing inspiration from the success of the recent decomposed diffusion sampler (DDS), our method treats the time dimension of a video as the batch dimension of image diffusion models and solves spatio-temporal optimization problems within denoised spatio-temporal batches derived from each image diffusion model. Moreover, we introduce a batch-consistent diffusion sampling strategy that encourages consistency across batches by synchronizing the stochastic noise components in image diffusion models. Our approach synergistically combines batch-consistent sampling with simultaneous optimization of denoised spatio-temporal batches at each reverse diffusion step, resulting in a novel and efficient diffusion sampling strategy for video inverse problems. Experimental results demonstrate that our method effectively addresses various spatio-temporal degradations in video inverse problems, achieving state-of-the-art reconstructions. Project page: https://svi-diffusion.github.io/
