Table of Contents
Fetching ...

Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers

Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhihao Li, Bihan Wen

TL;DR

This paper proposes a novel unsupervised video denoising framework, named ``Temporal As a Plugin'' (TAP), which integrates tunable temporal modules into a pre-trained image denoiser, and introduces a progressive fine-tuning strategy that refines each temporal module using the generated pseudo clean video frames, progressively enhancing the network's denoising performance.

Abstract

Recent advancements in deep learning have shown impressive results in image and video denoising, leveraging extensive pairs of noisy and noise-free data for supervision. However, the challenge of acquiring paired videos for dynamic scenes hampers the practical deployment of deep video denoising techniques. In contrast, this obstacle is less pronounced in image denoising, where paired data is more readily available. Thus, a well-trained image denoiser could serve as a reliable spatial prior for video denoising. In this paper, we propose a novel unsupervised video denoising framework, named ``Temporal As a Plugin'' (TAP), which integrates tunable temporal modules into a pre-trained image denoiser. By incorporating temporal modules, our method can harness temporal information across noisy frames, complementing its power of spatial denoising. Furthermore, we introduce a progressive fine-tuning strategy that refines each temporal module using the generated pseudo clean video frames, progressively enhancing the network's denoising performance. Compared to other unsupervised video denoising methods, our framework demonstrates superior performance on both sRGB and raw video denoising datasets.

Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers

TL;DR

This paper proposes a novel unsupervised video denoising framework, named ``Temporal As a Plugin'' (TAP), which integrates tunable temporal modules into a pre-trained image denoiser, and introduces a progressive fine-tuning strategy that refines each temporal module using the generated pseudo clean video frames, progressively enhancing the network's denoising performance.

Abstract

Recent advancements in deep learning have shown impressive results in image and video denoising, leveraging extensive pairs of noisy and noise-free data for supervision. However, the challenge of acquiring paired videos for dynamic scenes hampers the practical deployment of deep video denoising techniques. In contrast, this obstacle is less pronounced in image denoising, where paired data is more readily available. Thus, a well-trained image denoiser could serve as a reliable spatial prior for video denoising. In this paper, we propose a novel unsupervised video denoising framework, named ``Temporal As a Plugin'' (TAP), which integrates tunable temporal modules into a pre-trained image denoiser. By incorporating temporal modules, our method can harness temporal information across noisy frames, complementing its power of spatial denoising. Furthermore, we introduce a progressive fine-tuning strategy that refines each temporal module using the generated pseudo clean video frames, progressively enhancing the network's denoising performance. Compared to other unsupervised video denoising methods, our framework demonstrates superior performance on both sRGB and raw video denoising datasets.
Paper Structure (18 sections, 12 equations, 10 figures, 4 tables)

This paper contains 18 sections, 12 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Examples of denoising results from the CRVD outdoor dataset yue2020supervised. UDVD sheth2021unsupervised is an unsupervised denoiser trained using noisy videos only, which suffers from remaining noise and artifacts; FloRNN li2022unidirectional is a supervised video denoiser, generating over-smoothing results; Our method effectively eliminates both noise and artifacts while preserving image details.
  • Figure 2: Architecture of the video denoiser. We lift an encoder-decoder based image denoiser (blue part) for video denoising by plugging some temporal modules (yellow part) into its skip connection between the encoder and decoder. Note that parameters from the image denoiser are frozen and the tunable part is only the temporal modules.
  • Figure 3: Illustration of proposed unsupervised progressive fine-tuning strategy. The process begins with training the temporal module at level-3, subsequently progressing to the temporal modules at upper levels with pseudo video pairs.
  • Figure 4: Visual examples of the synthetic Gaussian denoising results on DAVIS (top) and Set8 (bottom) datasets, including the noisy inputs (with noise level $\sigma=30$), the restored images using FastDVDNet tassano2020fastdvdnet, FloRNN li2022unidirectional, UDVD sheth2021unsupervised, RFR lee2021restore, TAP, and TAP-T, as well as the clean images, respectively. † indicates the supervised method.
  • Figure 5: Visual examples of real raw video denoising on CRVD outdoor set. The noisy image, the restored images of UDVD sheth2021unsupervised, FloRNN li2022unidirectional, and TAP, respectively. † denotes the supervised method. We render raw images to sRGB images with the pre-trained ISP provided in yue2020supervised (ISO$=$25600).
  • ...and 5 more figures