TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

Yizhou Li; Zihua Liu; Yusuke Monno; Masatoshi Okutomi

TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

Yizhou Li, Zihua Liu, Yusuke Monno, Masatoshi Okutomi

TL;DR

This work introduces Temporally-consistent Diffusion Model (TDM), a unified diffusion-based framework for all-in-one real-world video restoration that handles multiple degradations with a single model. It leverages a pre-trained Stable Diffusion latent space and fine-tunes ControlNet via Task Prompt Guidance (TPG) using single-image inputs, enabling cross-task restoration without per-task training. For inference, TDM combines DDIM Inversion with Sliding Window Cross-Frame Attention (SW-CFA) to preserve content and enforce temporal consistency across larger motions. Extensive experiments across five restoration tasks demonstrate superior generalization to real-world videos and improved temporal stability compared to state-of-the-art baselines, establishing a scalable approach for practical video restoration workflows.

Abstract

In this paper, we propose the first diffusion-based all-in-one video restoration method that utilizes the power of a pre-trained Stable Diffusion and a fine-tuned ControlNet. Our method can restore various types of video degradation with a single unified model, overcoming the limitation of standard methods that require specific models for each restoration task. Our contributions include an efficient training strategy with Task Prompt Guidance (TPG) for diverse restoration tasks, an inference strategy that combines Denoising Diffusion Implicit Models~(DDIM) inversion with a novel Sliding Window Cross-Frame Attention (SW-CFA) mechanism for enhanced content preservation and temporal consistency, and a scalable pipeline that makes our method all-in-one to adapt to different video restoration tasks. Through extensive experiments on five video restoration tasks, we demonstrate the superiority of our method in generalization capability to real-world videos and temporal consistency preservation over existing state-of-the-art methods. Our method advances the video restoration task by providing a unified solution that enhances video quality across multiple applications.

TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

TL;DR

Abstract

Paper Structure (10 sections, 9 equations, 6 figures, 4 tables)

This paper contains 10 sections, 9 equations, 6 figures, 4 tables.

Introduction
Methodology
Preliminary
Training: Task Prompt Guided ControlNet Fine-Tuning with Single-Image Inputs
Inference: Training-Free Content-Preserved Temporal Consistency for Larger Motion
Experiments
Settings
Comparison with State-of-the-Art Methods
Ablation Study
Conclusion

Figures (6)

Figure 1: Our Temporally-consistent Diffusion Model (TDM) has two main features: (a) Our model is all-in-one and can restore various real-world video degradation with a single diffusion model under the guidance of task prompts. (b) Our model can generate temporally consistent video frames with better preservation of original contents included in the input video.
Figure 2: Overall architecture of our proposed temporally-consistent diffusion model (TDM).
Figure 3: Proposed SW-CFA compared with exisiting cross-frame attention.
Figure 4: Qualitative comparison with state-of-the-art methods.
Figure 5: Consistency comparison (MP4) with other diffusion-based methods.
...and 1 more figures

TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

TL;DR

Abstract

TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

Authors

TL;DR

Abstract

Table of Contents

Figures (6)