Lightweight Video Denoising Using a Classic Bayesian Backbone
Clément Bled, François Pitié
TL;DR
The paper addresses the speed-quality trade-off in video denoising by building a Wiener-filter backbone augmented with small neural refinements. It introduces a 4D Wiener framework with trainable windowing, a coring refinement network, and a blind denoising variant, achieving PSNR/SSIM close to the Video Restoration Transformer (VRT) while using only about $0.29$M parameters and delivering over an order of magnitude faster runtimes. The approach demonstrates strong performance gains over traditional baselines and competitive results against heavy transformers, with additional insights into motion compensation and multi-scale averaging. Overall, it offers an efficient, scalable pathway for high-quality video denoising suitable for real-time or resource-constrained deployment.
Abstract
In recent years, state-of-the-art image and video denoising networks have become increasingly large, requiring millions of trainable parameters to achieve best-in-class performance. Improved denoising quality has come at the cost of denoising speed, where modern transformer networks are far slower to run than smaller denoising networks such as FastDVDnet and classic Bayesian denoisers such as the Wiener filter. In this paper, we implement a hybrid Wiener filter which leverages small ancillary networks to increase the original denoiser performance, while retaining fast denoising speeds. These networks are used to refine the Wiener coring estimate, optimise windowing functions and estimate the unknown noise profile. Using these methods, we outperform several popular denoisers and remain within 0.2 dB, on average, of the popular VRT transformer. Our method was found to be over x10 faster than the transformer method, with a far lower parameter cost.
