Table of Contents
Fetching ...

Lightweight Video Denoising Using a Classic Bayesian Backbone

Clément Bled, François Pitié

TL;DR

The paper addresses the speed-quality trade-off in video denoising by building a Wiener-filter backbone augmented with small neural refinements. It introduces a 4D Wiener framework with trainable windowing, a coring refinement network, and a blind denoising variant, achieving PSNR/SSIM close to the Video Restoration Transformer (VRT) while using only about $0.29$M parameters and delivering over an order of magnitude faster runtimes. The approach demonstrates strong performance gains over traditional baselines and competitive results against heavy transformers, with additional insights into motion compensation and multi-scale averaging. Overall, it offers an efficient, scalable pathway for high-quality video denoising suitable for real-time or resource-constrained deployment.

Abstract

In recent years, state-of-the-art image and video denoising networks have become increasingly large, requiring millions of trainable parameters to achieve best-in-class performance. Improved denoising quality has come at the cost of denoising speed, where modern transformer networks are far slower to run than smaller denoising networks such as FastDVDnet and classic Bayesian denoisers such as the Wiener filter. In this paper, we implement a hybrid Wiener filter which leverages small ancillary networks to increase the original denoiser performance, while retaining fast denoising speeds. These networks are used to refine the Wiener coring estimate, optimise windowing functions and estimate the unknown noise profile. Using these methods, we outperform several popular denoisers and remain within 0.2 dB, on average, of the popular VRT transformer. Our method was found to be over x10 faster than the transformer method, with a far lower parameter cost.

Lightweight Video Denoising Using a Classic Bayesian Backbone

TL;DR

The paper addresses the speed-quality trade-off in video denoising by building a Wiener-filter backbone augmented with small neural refinements. It introduces a 4D Wiener framework with trainable windowing, a coring refinement network, and a blind denoising variant, achieving PSNR/SSIM close to the Video Restoration Transformer (VRT) while using only about M parameters and delivering over an order of magnitude faster runtimes. The approach demonstrates strong performance gains over traditional baselines and competitive results against heavy transformers, with additional insights into motion compensation and multi-scale averaging. Overall, it offers an efficient, scalable pathway for high-quality video denoising suitable for real-time or resource-constrained deployment.

Abstract

In recent years, state-of-the-art image and video denoising networks have become increasingly large, requiring millions of trainable parameters to achieve best-in-class performance. Improved denoising quality has come at the cost of denoising speed, where modern transformer networks are far slower to run than smaller denoising networks such as FastDVDnet and classic Bayesian denoisers such as the Wiener filter. In this paper, we implement a hybrid Wiener filter which leverages small ancillary networks to increase the original denoiser performance, while retaining fast denoising speeds. These networks are used to refine the Wiener coring estimate, optimise windowing functions and estimate the unknown noise profile. Using these methods, we outperform several popular denoisers and remain within 0.2 dB, on average, of the popular VRT transformer. Our method was found to be over x10 faster than the transformer method, with a far lower parameter cost.
Paper Structure (15 sections, 2 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 15 sections, 2 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: The two-stage coring refinement network architecture used to optimise the initial prediction of the coring function $H(\omega_1, \omega_2)$.
  • Figure 2: Graph Measuring Wiener window block size versus output quality in terms of PSNR (dB) and SSIM (0-1). Quality measurements are taken as an average of our 10-sequence test set. $1/4$ overlap stride used.
  • Figure 3: Sample output frame at $\sigma$ = 20, taken from benchmark scenes. For complete sequences, please visit our .https://github.com/MrBled/WienerNet-ICME