Denoising Monte Carlo Renders with Diffusion Models
Vaibhav Vavilala, Rahul Vasanth, David Forsyth
TL;DR
This work tackles Monte Carlo render noise, which is heavy-tailed at low sample counts, by employing a pixel-space diffusion model conditioned on render buffers to denoise low-spp images. The method leverages a pretrained DeepFloyd Stage II backbone with a trainable Control Module to fuse normals, albedo, depth, and other buffers, reversing the forward diffusion process with standard diffusion losses. Across multiple sampling rates, the approach is quantitatively competitive with SOTA and yields qualitatively more realistic images due to the strong image priors of the diffusion model, particularly in edges, shadows, and highlights. Practically, the method demonstrates that large-scale image foundations can be repurposed for MC denoising with substantial quality gains, albeit at higher inference cost than one-pass denoisers, with potential for future speedups and video extensions.
Abstract
Physically-based renderings contain Monte-Carlo noise, with variance that increases as the number of rays per pixel decreases. This noise, while zero-mean for good modern renderers, can have heavy tails (most notably, for scenes containing specular or refractive objects). Learned methods for restoring low fidelity renders are highly developed, because suppressing render noise means one can save compute and use fast renders with few rays per pixel. We demonstrate that a diffusion model can denoise low fidelity renders successfully. Furthermore, our method can be conditioned on a variety of natural render information, and this conditioning helps performance. Quantitative experiments show that our method is competitive with SOTA across a range of sampling rates. Qualitative examination of the reconstructions suggests that the image prior applied by a diffusion method strongly favors reconstructions that are like real images -- so have straight shadow boundaries, curved specularities and no fireflies.
