Realistic Noise Synthesis with Diffusion Models
Qi Wu, Mingyan Han, Ting Jiang, Chengzhi Jiang, Jinting Luo, Man Jiang, Haoqiang Fan, Shuaicheng Liu
TL;DR
Real-world RGB noise is irregular and tied to ISP processing and sensor factors, complicating denoising data collection. RNSD leverages diffusion models conditioned on clean content and camera settings, introducing TCCAM for time-aware affine modulation, MCAM for multi-scale content guidance, and DIPS for accelerated sampling, with forward dynamics $q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)$. The method yields higher realism (lower AKLD and PGap) and improves denoising PSNR/SSIM when used for augmentation, while DIPS reduces sampling from 1000 to 5 steps with minimal accuracy loss. Overall, RNSD provides a scalable, high-fidelity noise synthesis pipeline that enhances denoising performance across diverse camera sensors and ISP configurations.
Abstract
Deep denoising models require extensive real-world training data, which is challenging to acquire. Current noise synthesis techniques struggle to accurately model complex noise distributions. We propose a novel Realistic Noise Synthesis Diffusor (RNSD) method using diffusion models to address these challenges. By encoding camera settings into a time-aware camera-conditioned affine modulation (TCCAM), RNSD generates more realistic noise distributions under various camera conditions. Additionally, RNSD integrates a multi-scale content-aware module (MCAM), enabling the generation of structured noise with spatial correlations across multiple frequencies. We also introduce Deep Image Prior Sampling (DIPS), a learnable sampling sequence based on depth image prior, which significantly accelerates the sampling process while maintaining the high quality of synthesized noise. Extensive experiments demonstrate that our RNSD method significantly outperforms existing techniques in synthesizing realistic noise under multiple metrics and improving image denoising performance.
