Table of Contents
Fetching ...

Learning Truncated Causal History Model for Video Restoration

Amirhosein Ghasemabadi, Muhammad Kamran Janjua, Mohammad Salameh, Di Niu

TL;DR

This work proposes TURTLE to learn the truncated causal history model for efficient and high-performing video restoration, and enhances efficiency by storing and summarizing a truncated history of the input frame latent representation into an evolving historical state.

Abstract

One key challenge to video restoration is to model the transition dynamics of video frames governed by motion. In this work, we propose TURTLE to learn the truncated causal history model for efficient and high-performing video restoration. Unlike traditional methods that process a range of contextual frames in parallel, TURTLE enhances efficiency by storing and summarizing a truncated history of the input frame latent representation into an evolving historical state. This is achieved through a sophisticated similarity-based retrieval mechanism that implicitly accounts for inter-frame motion and alignment. The causal design in TURTLE enables recurrence in inference through state-memorized historical features while allowing parallel training by sampling truncated video clips. We report new state-of-the-art results on a multitude of video restoration benchmark tasks, including video desnowing, nighttime video deraining, video raindrops and rain streak removal, video super-resolution, real-world and synthetic video deblurring, and blind video denoising while reducing the computational cost compared to existing best contextual methods on all these tasks.

Learning Truncated Causal History Model for Video Restoration

TL;DR

This work proposes TURTLE to learn the truncated causal history model for efficient and high-performing video restoration, and enhances efficiency by storing and summarizing a truncated history of the input frame latent representation into an evolving historical state.

Abstract

One key challenge to video restoration is to model the transition dynamics of video frames governed by motion. In this work, we propose TURTLE to learn the truncated causal history model for efficient and high-performing video restoration. Unlike traditional methods that process a range of contextual frames in parallel, TURTLE enhances efficiency by storing and summarizing a truncated history of the input frame latent representation into an evolving historical state. This is achieved through a sophisticated similarity-based retrieval mechanism that implicitly accounts for inter-frame motion and alignment. The causal design in TURTLE enables recurrence in inference through state-memorized historical features while allowing parallel training by sampling truncated video clips. We report new state-of-the-art results on a multitude of video restoration benchmark tasks, including video desnowing, nighttime video deraining, video raindrops and rain streak removal, video super-resolution, real-world and synthetic video deblurring, and blind video denoising while reducing the computational cost compared to existing best contextual methods on all these tasks.
Paper Structure (44 sections, 1 theorem, 14 equations, 14 figures, 14 tables)

This paper contains 44 sections, 1 theorem, 14 equations, 14 figures, 14 tables.

Key Result

Lemma D.1

(Special Case of Causal History Model) In the absence of degradation and optimally compensated motion through optical flow, the state history $\mathbf{\hat{H}}_{t}^{[l]}$, then, only depends on the input $\mathbf{F}_{t}^{[l]}$, and the previous state $\mathbf{\hat{H}}_{t-1}^{[l]}$. Under this assump

Figures (14)

  • Figure 1: Turtle's Architecture. The overall architecture diagram of the proposed method. Turtle is a U-Net ronneberger2015u style architecture, wherein the encoder blocks are historyless feedforward blocks, while the decoder couples the causal history model (CHM) to condition the restoration procedure on truncated history of the input. We also present assorted restoration examples on the right--frame taken from video raindrops and rain streak removal wu2023mask, night deraining patil2022video, and video deblurring nah2017deep tasks, respectively.
  • Figure 2: Causal History Model. The diagrammatic illustration of the proposed Causal History Model (CHM) detailing the internal function. In the initial phase, for each patch in the current frame (denoted by the stars), we identify and implicitly align the top-k similar patches in the history. In the subsequent phase, we score and aggregate features from this aligned history to create a refined output that blends the input frame features with pertinent history data. We visualize frames in this diagram for exposition, but in practice the procedure operates on the feature maps.
  • Figure 3: Visual Results on Video Desnowing and Nighttime Video Deraining. We compare video desnowing results with the best published method in literature, SVDNet chen2023snow. The video frame has both snow, and haze. While SVDNet chen2023snow removes snow flakes, turtle can remove haze, and snow flakes, and hence is more faithful to the ground truth. In nighttime deraining, we compare turtle to MetaRain patil2022video. turtle maintains color consistency in the restored result.
  • Figure 4: Visual Results on Video Deblurring and Raindrops and Rain Streaks Removal. Qualitative results on video deblurring on the GoPro dataset nah2017deep are in the top row. Our method, turtle, restores the frames without any artifacts (see the number plate) unlike DSTNet pan2023deep. On video raindrops and rain streaks removal task, we compare our method with the best method in literature ViMPNet wu2023mask. Notice how the frame restored by ViMPNet wu2023mask has artifacts (see tree region, and the railing gate), while turtle's output is free of unwanted artifacts.
  • Figure 5: Blind Video Denoising and Video Super-Resolution Visual Results. Qualitative comparison of previous methods with turtle on a test frame from Set8 dataset for blind video denoising ($\sigma = 50$), and MVSR$4\times$ dataset wu2023mask for video super resolution. In video denoising, turtle restores details, while BSVD-64 qi2022real smudges textures (text and the dinosaur on the biker's jacket). In VSR, previous methods such as TTVSR liu2022learning, BasicVSR++ chan2022basicvsr++, or EAVSR wu2023mask tend to introduce blur in results, while turtle's restored results are sharper, and crisper.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Lemma D.1
  • proof : Proof 1