Table of Contents
Fetching ...

MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Ashish Tiwari, Satoshi Ikehata, Shanmuganathan Raman

Abstract

Photometric stereo typically demands intricate data acquisition setups involving multiple light sources to recover surface normals accurately. In this paper, we propose MERLiN, an attention-based hourglass network that integrates single image-based inverse rendering and relighting within a single unified framework. We evaluate the performance of photometric stereo methods using these relit images and demonstrate how they can circumvent the underlying challenge of complex data acquisition. Our physically-based model is trained on a large synthetic dataset containing complex shapes with spatially varying BRDF and is designed to handle indirect illumination effects to improve material reconstruction and relighting. Through extensive qualitative and quantitative evaluation, we demonstrate that the proposed framework generalizes well to real-world images, achieving high-quality shape, material estimation, and relighting. We assess these synthetically relit images over photometric stereo benchmark methods for their physical correctness and resulting normal estimation accuracy, paving the way towards single-shot photometric stereo through physically-based relighting. This work allows us to address the single image-based inverse rendering problem holistically, applying well to both synthetic and real data and taking a step towards mitigating the challenge of data acquisition in photometric stereo.

MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Abstract

Photometric stereo typically demands intricate data acquisition setups involving multiple light sources to recover surface normals accurately. In this paper, we propose MERLiN, an attention-based hourglass network that integrates single image-based inverse rendering and relighting within a single unified framework. We evaluate the performance of photometric stereo methods using these relit images and demonstrate how they can circumvent the underlying challenge of complex data acquisition. Our physically-based model is trained on a large synthetic dataset containing complex shapes with spatially varying BRDF and is designed to handle indirect illumination effects to improve material reconstruction and relighting. Through extensive qualitative and quantitative evaluation, we demonstrate that the proposed framework generalizes well to real-world images, achieving high-quality shape, material estimation, and relighting. We assess these synthetically relit images over photometric stereo benchmark methods for their physical correctness and resulting normal estimation accuracy, paving the way towards single-shot photometric stereo through physically-based relighting. This work allows us to address the single image-based inverse rendering problem holistically, applying well to both synthetic and real data and taking a step towards mitigating the challenge of data acquisition in photometric stereo.
Paper Structure (11 sections, 7 equations, 7 figures, 3 tables)

This paper contains 11 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a) Effect of relighting (under four light positions $l_{1}, l_{2}, l_{3}, l_{4}$) through BRDF rendering layer $f_{BRDF}$ and neural network $(f_{rel})$, (b) Effect of training with direct (top) vs global illumination (bottom) images. The estimated normals without global illumination are flattened and produce brighter albedo (top). (c) Two different sets of perceptually similar images with different underlying normals maps.
  • Figure 2: The proposed framework for single image-based $(I_{src})$ svBRDF estimation $(\widehat{A}, \widehat{N}, \widehat{D}, \widehat{R})$ and relighting $(I_{ref})$. The design of the encoder and decoder of the global illumination network ($f_{gl}$) is the same as $f_{enc}$ and $f_{inv\_dec}$, respectively. The superscript $(d)$ and subscript $(gl)$ represent the direct and indirect illumination, respectively. S1-S4 are residual skip connections.
  • Figure 3: Qualitative results on the test set of li2018learning emphasizing global-illumination effects. The superscript $(d)$ and subscript $(gl)$ represent the direct and global illumination components, respectively. Best viewed in PDF with zoom.
  • Figure 4: Qualitative comparison of svBRDF parameters: albedo (A), normal (N), roughness (R), and depth (D) among MERLiN and methods sang2020single and li2018learning. Differences can be observed in the marked regions across different svBRDF parameters.
  • Figure 5: Qualitative evaluation of the relit images generated through MERLiN, sang2020single, and li2018learning under point lighting over the test dataset of real images.
  • ...and 2 more figures