Table of Contents
Fetching ...

Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images

Sai Bi, Zexiang Xu, Kalyan Sunkavalli, Miloš Hašan, Yannick Hold-Geoffroy, David Kriegman, Ravi Ramamoorthi

TL;DR

This paper addresses relighting and view synthesis of real scenes from unstructured multi-view photographs captured with collocated camera and flash. It introduces Deep Reflectance Volumes, a neural volumetric representation consisting of $\alpha$ (opacity), $\mathbf{n}$ (normals), and $R$ (BRDF-based reflectance) volumes, learned through a physically based differentiable volume ray marching renderer. A decoder-like network with a learnable warping function $W$ maps a 512-channel scene encoding to voxel volumes, enabling joint optimization of geometry and spatially varying materials. The framework supports relighting under novel lighting and arbitrary viewpoints, performs view synthesis and material editing, and outperforms state-of-the-art mesh-based methods on challenging scenes with occlusions and specularities. Together, the approach offers practical, off-the-shelf data capture and photorealistic rendering suitable for VR/AR visualization and immersive applications.

Abstract

We present a deep learning approach to reconstruct scene appearance from unstructured images captured under collocated point lighting. At the heart of Deep Reflectance Volumes is a novel volumetric scene representation consisting of opacity, surface normal and reflectance voxel grids. We present a novel physically-based differentiable volume ray marching framework to render these scene volumes under arbitrary viewpoint and lighting. This allows us to optimize the scene volumes to minimize the error between their rendered images and the captured images. Our method is able to reconstruct real scenes with challenging non-Lambertian reflectance and complex geometry with occlusions and shadowing. Moreover, it accurately generalizes to novel viewpoints and lighting, including non-collocated lighting, rendering photorealistic images that are significantly better than state-of-the-art mesh-based methods. We also show that our learned reflectance volumes are editable, allowing for modifying the materials of the captured scenes.

Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images

TL;DR

This paper addresses relighting and view synthesis of real scenes from unstructured multi-view photographs captured with collocated camera and flash. It introduces Deep Reflectance Volumes, a neural volumetric representation consisting of (opacity), (normals), and (BRDF-based reflectance) volumes, learned through a physically based differentiable volume ray marching renderer. A decoder-like network with a learnable warping function maps a 512-channel scene encoding to voxel volumes, enabling joint optimization of geometry and spatially varying materials. The framework supports relighting under novel lighting and arbitrary viewpoints, performs view synthesis and material editing, and outperforms state-of-the-art mesh-based methods on challenging scenes with occlusions and specularities. Together, the approach offers practical, off-the-shelf data capture and photorealistic rendering suitable for VR/AR visualization and immersive applications.

Abstract

We present a deep learning approach to reconstruct scene appearance from unstructured images captured under collocated point lighting. At the heart of Deep Reflectance Volumes is a novel volumetric scene representation consisting of opacity, surface normal and reflectance voxel grids. We present a novel physically-based differentiable volume ray marching framework to render these scene volumes under arbitrary viewpoint and lighting. This allows us to optimize the scene volumes to minimize the error between their rendered images and the captured images. Our method is able to reconstruct real scenes with challenging non-Lambertian reflectance and complex geometry with occlusions and shadowing. Moreover, it accurately generalizes to novel viewpoints and lighting, including non-collocated lighting, rendering photorealistic images that are significantly better than state-of-the-art mesh-based methods. We also show that our learned reflectance volumes are editable, allowing for modifying the materials of the captured scenes.

Paper Structure

This paper contains 17 sections, 12 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Given a set of images taken using a mobile phone with flashlight (sampled images are shown in (a)), our method learns a volume representation of the captured object by estimating the opacity volume, normal volume (b) and reflectance volumes such as albedo (c) and roughness (d). Our volume representation enables free navigation of the object under arbitrary viewpoints and novel lighting conditions (e).
  • Figure 2: We propose Deep Reflectance Volume representation to capture scene geometry and appearance, where each voxel consists of opacity $\alpha$, normal $n$ and reflectance (material coefficients) $R$. During rendering, we perform ray marching through each pixel and accumulate contributions from each point $\bm{x}_s$ along the ray. Each contribution is calculated using the local normal, reflectance and lighting information. We accumulate opacity from both the camera $\alpha_{c\rightarrow s}$ and the light $\alpha_{l\rightarrow t}$ to model the light transport loss in both occlusions and shadows. To predict such a volume, we start from an encoding vector, and decode it into a volume using a 3D convolutional neural network; thus the combination of the encoding vector and network weights is the unknown variable being optimized (trained). We train on images captured with collocated camera and light by enforcing a loss function between rendered images and training images.
  • Figure 3: Comparisons with mesh-based reconstruction. We show renderings of the captured object under both collocated (column 2, 3) and non-collocated (column 4, 5) camera and light. We compare our volume-based neural reconstruction against a state-of-the-art method nam2018practical that reconstructs mesh and per-vertex BRDFs. Nam et al. nam2018practical fails to handle such challenging cases and recovers inaccurate geometry and appearance. In contrast our method produces photo-realistic results.
  • Figure 4: Additional results on real scenes. We show renderings under novel view and lighting conditions. Our method is able to handle scenes with multiple objects (top two rows) and model the complex occlusions between them. Our method can also generate high-quality results from casual handheld video captures (third row), which demonstrates the practicability of our approach.
  • Figure 5: We evaluate the performance of our method on the House scene with different numbers of training images. Although we use all $385$ images in our final experiments, our method is able to achieve comparable performance with as few as 200 images for this challenging scene.
  • ...and 7 more figures