Table of Contents
Fetching ...

Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising

Jon Hasselgren, Nikolai Hofmann, Jacob Munkberg

TL;DR

This work tackles the challenge of jointly reconstructing shape, materials, and environment lighting from multi-view images under a physically-based shading model. It introduces a differentiable Monte Carlo renderer with ray tracing, integrated with multiple importance sampling and differentiable denoising to manage Monte Carlo noise and enable gradient-based optimization. The system reconstructs explicit triangle meshes, spatially varying BRDFs, and HDR light probes, achieving substantially better material and light separation and enabling relighting and editing, as demonstrated on synthetic and real datasets with ablations confirming the value of variance reduction techniques. While providing practical performance improvements, it remains limited to direct illumination and single-scattering regimes, with future work aimed at incorporating multi-bounce global illumination and more robust regularization.

Abstract

Recent advances in differentiable rendering have enabled high-quality reconstruction of 3D scenes from multi-view images. Most methods rely on simple rendering algorithms: pre-filtered direct lighting or learned representations of irradiance. We show that a more realistic shading model, incorporating ray tracing and Monte Carlo integration, substantially improves decomposition into shape, materials & lighting. Unfortunately, Monte Carlo integration provides estimates with significant noise, even at large sample counts, which makes gradient-based inverse rendering very challenging. To address this, we incorporate multiple importance sampling and denoising in a novel inverse rendering pipeline. This substantially improves convergence and enables gradient-based optimization at low sample counts. We present an efficient method to jointly reconstruct geometry (explicit triangle meshes), materials, and lighting, which substantially improves material and light separation compared to previous work. We argue that denoising can become an integral part of high quality inverse rendering pipelines.

Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising

TL;DR

This work tackles the challenge of jointly reconstructing shape, materials, and environment lighting from multi-view images under a physically-based shading model. It introduces a differentiable Monte Carlo renderer with ray tracing, integrated with multiple importance sampling and differentiable denoising to manage Monte Carlo noise and enable gradient-based optimization. The system reconstructs explicit triangle meshes, spatially varying BRDFs, and HDR light probes, achieving substantially better material and light separation and enabling relighting and editing, as demonstrated on synthetic and real datasets with ablations confirming the value of variance reduction techniques. While providing practical performance improvements, it remains limited to direct illumination and single-scattering regimes, with future work aimed at incorporating multi-bounce global illumination and more robust regularization.

Abstract

Recent advances in differentiable rendering have enabled high-quality reconstruction of 3D scenes from multi-view images. Most methods rely on simple rendering algorithms: pre-filtered direct lighting or learned representations of irradiance. We show that a more realistic shading model, incorporating ray tracing and Monte Carlo integration, substantially improves decomposition into shape, materials & lighting. Unfortunately, Monte Carlo integration provides estimates with significant noise, even at large sample counts, which makes gradient-based inverse rendering very challenging. To address this, we incorporate multiple importance sampling and denoising in a novel inverse rendering pipeline. This substantially improves convergence and enables gradient-based optimization at low sample counts. We present an efficient method to jointly reconstruct geometry (explicit triangle meshes), materials, and lighting, which substantially improves material and light separation compared to previous work. We argue that denoising can become an integral part of high quality inverse rendering pipelines.
Paper Structure (30 sections, 15 equations, 27 figures, 5 tables)

This paper contains 30 sections, 15 equations, 27 figures, 5 tables.

Figures (27)

  • Figure 1: nvdiffrecmunkberg2021nvdiffrec successfully reconstructs complex geometry from multi-view images, but struggles with the material & light separation. In the top row, we visualize split-screens of the rendered reconstruction and the diffuse albedo texture. Note that nvdiffrec bakes most of the lighting in the albedo texture, which hurts quality in relighting scenarios (shown in the bottom row). In contrast, by leveraging a more advanced renderer, we successfully disentangle material and lighting (note the lack of shading in the albedo texture), and improve relighting quality. The dataset consists of 200 views of the Rollercoaster from LDraw resources Lasser2022 (CC BY-2.0).
  • Figure 2: We extend nvdiffrecmunkberg2021nvdiffrec with a differentiable Monte Carlo renderer for direct illumination. Additionally, to reduce variance, we add a differentiable denoiser. These novel steps are highlighted in green. Following nvdiffrec, the topology is parameterized using an SDF, and a triangular surface mesh is extracted in each iteration using DMTet Shen2021, combined with spatially-varying PBR materials and HDR environment lighting. The system is supervised using only photometric loss on the rendered, denoised image compared to a reference, and gradients are back-propagated to the denoiser, shape, materials, and lighting parameters. All parameters are optimized jointly.
  • Figure 3: Visualization of the optimization process. Note that the initial guess for topology are randomized SDF values on the grid. After 1000 iterations, we already have a high quality topology and plausible materials and lighting for this complicated asset. Synthetic dataset with 200 frames, generated from a part of the Apollo capsule, courtesy of the Smithsonian Smithsonian2020 (CC0-1.0).
  • Figure 4: We separate lighting into diffuse lighting, $\mathbf{c}_d$, diffuse reflectance, $\mathbf{k}_d$, and specular lighting, $\mathbf{c}_s$. This enables fine-grained regularization and denoising without smearing texture detail.
  • Figure 5: Ablation study on the effect of using different denoising algorithms during optimization at low sample counts on three different scenes of increasing complexity (from left to right). We plot averaged PSNR scores over 200 novel views, rendered without denoising, using high sample counts. In this experiment, we used decorrelated samples in the backward pass to highlight the effect of denoising. The most complex scene (Porsche) failed to converge at 8 spp without denoising.
  • ...and 22 more figures