Robust Average Networks for Monte Carlo Denoising
Javor Kalojanov, Kimball Thurston
TL;DR
The paper addresses the challenge of denoising Monte Carlo renders in production by introducing Robust Average blocks that convert spatial kernel-predictive networks into bidirectional spatio-temporal denoisers. These blocks perform learned, robust temporal interpolation over a fixed window, use motion-compensated warping, and are trained with a spatial-to-temporal loss formulation to encourage temporal information usage without ground-truth sequences, complemented by thresholded kernel predictions to suppress outliers. Key contributions include RA blocks inserted at multiple depths, a temporal loss reformulation, and empirical evidence showing improved temporal coherence and edge preservation with competitive perceptual metrics, albeit with increased model complexity and inference time. The work enables production-friendly, temporally stable denoising for complex VFX scenes and lays groundwork for further improvements in temporal denoising efficiency and robustness.
Abstract
We present a method for converting denoising neural networks from spatial into spatio-temporal ones by modifying the network architecture and loss function. We insert Robust Average blocks at arbitrary depths in the network graph. Each block performs latent space interpolation with trainable weights and works on the sequence of image representations from the preceding spatial components of the network. The temporal connections are kept live during training by forcing the network to predict a denoised frame from subsets of the input sequence. Using temporal coherence for denoising improves image quality and reduces temporal flickering independent of scene or image complexity.
