Unsupervised Real-World Denoising: Sparsity is All You Need
Hamadi Chihaoui, Paolo Favaro
TL;DR
The paper tackles real-world image denoising without paired clean-noisy data by introducing Mask, Inpaint and Denoise (MID), which leverages random input masking to bridge the gap between synthetic and real noise and jointly denoises and inpaints masked inputs. MID iteratively refines a noise sampler by using the denoiser’s predictions on real noisy images, creating progressively more realistic noise samples and improving the denoiser subsequently. Through extensive experiments on SIDD and DND, MID achieves competitive or superior performance compared to existing unsupervised methods, approaching results of supervised approaches while avoiding adversarial training. The approach offers a practical, scalable solution for real-world denoising with unpaired data and demonstrates the value of masking, inpainting, and iterative noise estimation in bridging distribution gaps.
Abstract
Supervised training for real-world denoising presents challenges due to the difficulty of collecting large datasets of paired noisy and clean images. Recent methods have attempted to address this by utilizing unpaired datasets of clean and noisy images. Some approaches leverage such unpaired data to train denoisers in a supervised manner by generating synthetic clean-noisy pairs. However, these methods often fall short due to the distribution gap between synthetic and real noisy images. To mitigate this issue, we propose a solution based on input sparsification, specifically using random input masking. Our method, which we refer to as Mask, Inpaint and Denoise (MID), trains a denoiser to simultaneously denoise and inpaint synthetic clean-noisy pairs. On one hand, input sparsification reduces the gap between synthetic and real noisy images. On the other hand, an inpainter trained in a supervised manner can still accurately reconstruct sparse inputs by predicting missing clean pixels using the remaining unmasked pixels. Our approach begins with a synthetic Gaussian noise sampler and iteratively refines it using a noise dataset derived from the denoiser's predictions. The noise dataset is created by subtracting predicted pseudo-clean images from real noisy images at each iteration. The core intuition is that improving the denoiser results in a more accurate noise dataset and, consequently, a better noise sampler. We validate our method through extensive experiments on real-world noisy image datasets, demonstrating competitive performance compared to existing unsupervised denoising methods.
