Combining Pre- and Post-Demosaicking Noise Removal for RAW Video
Marco Sánchez-Beeckman, Antoni Buades, Nicola Brandonisio, Bilel Kanoun
TL;DR
This work addresses robust RAW video denoising across varying noise levels and scenes by introducing a two-stage pipeline that denoises both in the RAW mosaicked domain and after demosaicking, controlled by a tunable balance $ \alpha $. It combines self-similarity-based denoising with temporal trajectory prefiltering and a PCA-based spatio-temporal denoising block, guided by a sensor noise model to adapt to realistic Poisson-Gaussian noise. The approach demonstrates competitive performance against state-of-the-art deep learning methods on real and synthetic data while offering lower memory usage and greater adaptability, making it practical for real-world videography and potential mobile deployment. Overall, the method provides a robust, efficient framework for RAW video denoising that respects CFA structure and noise correlations while preserving fine textures.
Abstract
Denoising is one of the fundamental steps of the processing pipeline that converts data captured by a camera sensor into a display-ready image or video. It is generally performed early in the pipeline, usually before demosaicking, although studies swapping their order or even conducting them jointly have been proposed. With the advent of deep learning, the quality of denoising algorithms has steadily increased. Even so, modern neural networks still have a hard time adapting to new noise levels and scenes, which is indispensable for real-world applications. With those in mind, we propose a self-similarity-based denoising scheme that weights both a pre- and a post-demosaicking denoiser for Bayer-patterned CFA video data. We show that a balance between the two leads to better image quality, and we empirically find that higher noise levels benefit from a higher influence pre-demosaicking. We also integrate temporal trajectory prefiltering steps before each denoiser, which further improve texture reconstruction. The proposed method only requires an estimation of the noise model at the sensor, accurately adapts to any noise level, and is competitive with the state of the art, making it suitable for real-world videography.
