Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, Chongyi Li
TL;DR
The paper tackles robust video denoising under diverse real-world noise by decoupling noise analysis from denoising through a hypernetwork that predicts per-input, spatially varying parameters for a traditional Wiener temporal fusion and bilateral Laplacian pyramid denoiser. This differentiable integration preserves the reliability and speed of classic methods while offering user control and improved generalization. Key contributions include a noise-profiling module using an anchor-frame strategy with consistency constraints, a parameter-predicting network that drives a spatially adaptive denoiser, and an augmentation pipeline based on AWGN with H.264 transcoding that enhances performance on unseen noise patterns. The approach yields robust, real-time video denoising suitable for professional editing workflows and demonstrates strong quantitative and qualitative gains on the CRVD benchmark and real footage, with practical insights into deployment and limitations.
Abstract
Denoising is a crucial step in many video processing pipelines such as in interactive editing, where high quality, speed, and user control are essential. While recent approaches achieve significant improvements in denoising quality by leveraging deep learning, they are prone to unexpected failures due to discrepancies between training data distributions and the wide variety of noise patterns found in real-world videos. These methods also tend to be slow and lack user control. In contrast, traditional denoising methods perform reliably on in-the-wild videos and run relatively quickly on modern hardware. However, they require manually tuning parameters for each input video, which is not only tedious but also requires skill. We bridge the gap between these two paradigms by proposing a differentiable denoising pipeline based on traditional methods. A neural network is then trained to predict the optimal denoising parameters for each specific input, resulting in a robust and efficient approach that also supports user control.
