Table of Contents
Fetching ...

Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable

Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, Chongyi Li

TL;DR

The paper tackles robust video denoising under diverse real-world noise by decoupling noise analysis from denoising through a hypernetwork that predicts per-input, spatially varying parameters for a traditional Wiener temporal fusion and bilateral Laplacian pyramid denoiser. This differentiable integration preserves the reliability and speed of classic methods while offering user control and improved generalization. Key contributions include a noise-profiling module using an anchor-frame strategy with consistency constraints, a parameter-predicting network that drives a spatially adaptive denoiser, and an augmentation pipeline based on AWGN with H.264 transcoding that enhances performance on unseen noise patterns. The approach yields robust, real-time video denoising suitable for professional editing workflows and demonstrates strong quantitative and qualitative gains on the CRVD benchmark and real footage, with practical insights into deployment and limitations.

Abstract

Denoising is a crucial step in many video processing pipelines such as in interactive editing, where high quality, speed, and user control are essential. While recent approaches achieve significant improvements in denoising quality by leveraging deep learning, they are prone to unexpected failures due to discrepancies between training data distributions and the wide variety of noise patterns found in real-world videos. These methods also tend to be slow and lack user control. In contrast, traditional denoising methods perform reliably on in-the-wild videos and run relatively quickly on modern hardware. However, they require manually tuning parameters for each input video, which is not only tedious but also requires skill. We bridge the gap between these two paradigms by proposing a differentiable denoising pipeline based on traditional methods. A neural network is then trained to predict the optimal denoising parameters for each specific input, resulting in a robust and efficient approach that also supports user control.

Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable

TL;DR

The paper tackles robust video denoising under diverse real-world noise by decoupling noise analysis from denoising through a hypernetwork that predicts per-input, spatially varying parameters for a traditional Wiener temporal fusion and bilateral Laplacian pyramid denoiser. This differentiable integration preserves the reliability and speed of classic methods while offering user control and improved generalization. Key contributions include a noise-profiling module using an anchor-frame strategy with consistency constraints, a parameter-predicting network that drives a spatially adaptive denoiser, and an augmentation pipeline based on AWGN with H.264 transcoding that enhances performance on unseen noise patterns. The approach yields robust, real-time video denoising suitable for professional editing workflows and demonstrates strong quantitative and qualitative gains on the CRVD benchmark and real footage, with practical insights into deployment and limitations.

Abstract

Denoising is a crucial step in many video processing pipelines such as in interactive editing, where high quality, speed, and user control are essential. While recent approaches achieve significant improvements in denoising quality by leveraging deep learning, they are prone to unexpected failures due to discrepancies between training data distributions and the wide variety of noise patterns found in real-world videos. These methods also tend to be slow and lack user control. In contrast, traditional denoising methods perform reliably on in-the-wild videos and run relatively quickly on modern hardware. However, they require manually tuning parameters for each input video, which is not only tedious but also requires skill. We bridge the gap between these two paradigms by proposing a differentiable denoising pipeline based on traditional methods. A neural network is then trained to predict the optimal denoising parameters for each specific input, resulting in a robust and efficient approach that also supports user control.

Paper Structure

This paper contains 14 sections, 2 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: High-level comparison of how related work approaches video denoising (left) and our proposed approach (right).
  • Figure 1: Denoising results on the CRVD (sRGB) benchmark for various methods that we compare to, both for their original version as well as our retrained one which we denote with a $\dagger$.
  • Figure 2: Video denoising on the CRVD (sRGB) benchmark yue2020supervised using the PSNR across all ISO values with respect to the computational efficiency on an RTX 3090 GPU in FPS (frames per second). We improved some models through retraining, denoted with a $\dagger$.
  • Figure 2: Video denoising results on the CRVD (sRGB) benchmark. Not only does our approach perform best overall, it is also four times faster than the second-fastest place. Please kindly see the supplementary for SSIM and LPIPS where our approach ranks first as well.
  • Figure 3: Video denoising fundamentally first needs to analyze the noise and then remove it. We mimic this in our pipeline by first estimating a noise profile on a random anchor frame (top left in green) before using this profile to denoise the video (bottom in yellow). Specifically, we leverage a hypernetwork configuration where the noise profile $\theta$ is essentially the parameters of the subsequent denoiser. That is, our denoiser is a traditional pipeline consisting of (1) a Wiener filter that performs temporal denoising of neighboring frames that were aligned via optical flow and (2) a bilateral Laplacian pyramid filter for spatial denoising of the temporally merged frames, where a small neural network $\mathcal{P}(\cdot;\theta)$ predicts spatially-varying parameters for the Wiener merger and the bilateral filters. This separation of concerns improves the overall efficiency since it avoids having to redundantly analyze the noise over and over again.
  • ...and 7 more figures