Table of Contents
Fetching ...

Unsupervised Real-World Denoising: Sparsity is All You Need

Hamadi Chihaoui, Paolo Favaro

TL;DR

The paper tackles real-world image denoising without paired clean-noisy data by introducing Mask, Inpaint and Denoise (MID), which leverages random input masking to bridge the gap between synthetic and real noise and jointly denoises and inpaints masked inputs. MID iteratively refines a noise sampler by using the denoiser’s predictions on real noisy images, creating progressively more realistic noise samples and improving the denoiser subsequently. Through extensive experiments on SIDD and DND, MID achieves competitive or superior performance compared to existing unsupervised methods, approaching results of supervised approaches while avoiding adversarial training. The approach offers a practical, scalable solution for real-world denoising with unpaired data and demonstrates the value of masking, inpainting, and iterative noise estimation in bridging distribution gaps.

Abstract

Supervised training for real-world denoising presents challenges due to the difficulty of collecting large datasets of paired noisy and clean images. Recent methods have attempted to address this by utilizing unpaired datasets of clean and noisy images. Some approaches leverage such unpaired data to train denoisers in a supervised manner by generating synthetic clean-noisy pairs. However, these methods often fall short due to the distribution gap between synthetic and real noisy images. To mitigate this issue, we propose a solution based on input sparsification, specifically using random input masking. Our method, which we refer to as Mask, Inpaint and Denoise (MID), trains a denoiser to simultaneously denoise and inpaint synthetic clean-noisy pairs. On one hand, input sparsification reduces the gap between synthetic and real noisy images. On the other hand, an inpainter trained in a supervised manner can still accurately reconstruct sparse inputs by predicting missing clean pixels using the remaining unmasked pixels. Our approach begins with a synthetic Gaussian noise sampler and iteratively refines it using a noise dataset derived from the denoiser's predictions. The noise dataset is created by subtracting predicted pseudo-clean images from real noisy images at each iteration. The core intuition is that improving the denoiser results in a more accurate noise dataset and, consequently, a better noise sampler. We validate our method through extensive experiments on real-world noisy image datasets, demonstrating competitive performance compared to existing unsupervised denoising methods.

Unsupervised Real-World Denoising: Sparsity is All You Need

TL;DR

The paper tackles real-world image denoising without paired clean-noisy data by introducing Mask, Inpaint and Denoise (MID), which leverages random input masking to bridge the gap between synthetic and real noise and jointly denoises and inpaints masked inputs. MID iteratively refines a noise sampler by using the denoiser’s predictions on real noisy images, creating progressively more realistic noise samples and improving the denoiser subsequently. Through extensive experiments on SIDD and DND, MID achieves competitive or superior performance compared to existing unsupervised methods, approaching results of supervised approaches while avoiding adversarial training. The approach offers a practical, scalable solution for real-world denoising with unpaired data and demonstrates the value of masking, inpainting, and iterative noise estimation in bridging distribution gaps.

Abstract

Supervised training for real-world denoising presents challenges due to the difficulty of collecting large datasets of paired noisy and clean images. Recent methods have attempted to address this by utilizing unpaired datasets of clean and noisy images. Some approaches leverage such unpaired data to train denoisers in a supervised manner by generating synthetic clean-noisy pairs. However, these methods often fall short due to the distribution gap between synthetic and real noisy images. To mitigate this issue, we propose a solution based on input sparsification, specifically using random input masking. Our method, which we refer to as Mask, Inpaint and Denoise (MID), trains a denoiser to simultaneously denoise and inpaint synthetic clean-noisy pairs. On one hand, input sparsification reduces the gap between synthetic and real noisy images. On the other hand, an inpainter trained in a supervised manner can still accurately reconstruct sparse inputs by predicting missing clean pixels using the remaining unmasked pixels. Our approach begins with a synthetic Gaussian noise sampler and iteratively refines it using a noise dataset derived from the denoiser's predictions. The noise dataset is created by subtracting predicted pseudo-clean images from real noisy images at each iteration. The core intuition is that improving the denoiser results in a more accurate noise dataset and, consequently, a better noise sampler. We validate our method through extensive experiments on real-world noisy image datasets, demonstrating competitive performance compared to existing unsupervised denoising methods.

Paper Structure

This paper contains 23 sections, 1 theorem, 6 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Let $a^s = \mathbf{M}\odot y^s$ and $b^s = (1-\mathbf{M})\odot y^s$ be the visible and hidden pixels of a synthetic noisy image, and $a^r = \mathbf{M}\odot y^r$ and $b^r = (1-\mathbf{M})\odot y^r$ be the visible and hidden pixels of a real noisy image, respectively. Let us also denote with $p(a,b)$

Figures (7)

  • Figure 1: Visual comparison of unsupervised denoising methods on the SIDD validation dataset. Our method (MID) preserves fine details better. Zoom in to see the reconstruction accuracy.
  • Figure 2: Overview of MID. Top row: We show the processing steps used during supervised training. We use clean images from the available dataset, add synthetic noise (initially simply AWGN, and then later the noise samples are extracted from the real noisy images), mask the pixels and then train a denoiser to predict the clean image by minimizing a Mean Squared Error (MSE) loss. Middle row: To obtain better noise samples, we use the trained denoiser on the dataset of real noisy images. The noise samples are obtained simply by computing the residual between the predicted pseudo-clean image and the original noisy input. These residuals are then used as new noise samples in a new training of the denoiser. Bottom row: At test time we simply apply the trained denoiser on new real noisy images after applying masking. To further boost the accuracy, we average the predicted clean images for several random masks.
  • Figure 3: The denoiser performance on SIDD validation set when trained with an AWGN sampler.
  • Figure 4: Inpainter performance on a validation set trained in a supervised way at different random masking ratios.
  • Figure 5: Left: Masked input at 80% of the pixels. Middle: The output of the inpainter on the image on the left. Right: Original images from the validation set.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof