Table of Contents
Fetching ...

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Andreas Engelhardt, Amit Raj, Mark Boss, Yunzhi Zhang, Abhishek Kar, Yuanzhen Li, Deqing Sun, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani

TL;DR

SHINOBI tackles the problem of jointly estimating 3D shape, material properties, and illumination from unconstrained in-the-wild image collections. It introduces a hybrid encoding that combines a multiresolution hash grid with Fourier features to enable fast, robust optimization of geometry, BRDF, illumination, and camera parameters, augmented by a camera multiplex and patch-based alignment losses. Experiments on NAVI in-the-wild data show SHINOBI achieving state-of-the-art view synthesis and improved camera pose accuracy, with faster runtimes than prior work and enabling relighting and material editing. This work offers a scalable pathway to relightable 3D asset creation for AR/VR, games, and film, while acknowledging limitations in handling extreme lighting, thin/transparent structures, and full light-transport effects.

Abstract

We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit shape representation based on a multi-resolution hash encoding enables faster and robust shape reconstruction with joint camera alignment optimization that outperforms prior work. Further, to enable the editing of illumination and object reflectance (i.e. material) we jointly optimize BRDF and illumination together with the object's shape. Our method is class-agnostic and works on in-the-wild image collections of objects to produce relightable 3D assets for several use cases such as AR/VR, movies, games, etc. Project page: https://shinobi.aengelhardt.com Video: https://www.youtube.com/watch?v=iFENQ6AcYd8&feature=youtu.be

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

TL;DR

SHINOBI tackles the problem of jointly estimating 3D shape, material properties, and illumination from unconstrained in-the-wild image collections. It introduces a hybrid encoding that combines a multiresolution hash grid with Fourier features to enable fast, robust optimization of geometry, BRDF, illumination, and camera parameters, augmented by a camera multiplex and patch-based alignment losses. Experiments on NAVI in-the-wild data show SHINOBI achieving state-of-the-art view synthesis and improved camera pose accuracy, with faster runtimes than prior work and enabling relighting and material editing. This work offers a scalable pathway to relightable 3D asset creation for AR/VR, games, and film, while acknowledging limitations in handling extreme lighting, thin/transparent structures, and full light-transport effects.

Abstract

We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit shape representation based on a multi-resolution hash encoding enables faster and robust shape reconstruction with joint camera alignment optimization that outperforms prior work. Further, to enable the editing of illumination and object reflectance (i.e. material) we jointly optimize BRDF and illumination together with the object's shape. Our method is class-agnostic and works on in-the-wild image collections of objects to produce relightable 3D assets for several use cases such as AR/VR, movies, games, etc. Project page: https://shinobi.aengelhardt.com Video: https://www.youtube.com/watch?v=iFENQ6AcYd8&feature=youtu.be
Paper Structure (19 sections, 6 equations, 13 figures, 5 tables)

This paper contains 19 sections, 6 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The SHINOBI pipeline. Two resolution annealed encoding branches, the multiresolution hash grid $H(\bm{x})$ and the Fourier embedding $\gamma(\bm{x})$ are used to learn a neural volume conditioned on the input coordinates. This enables robust optimization of camera parameters jointly with the shape, material and illumination.
  • Figure 2: Constrained camera multiplex. We optimize multiple camera proposals per image and weight the contribution to the reconstruction according to a camera's performance on the loss. Between cameras of a multiplex we add a projection based regularization: Points from all members are projected into the currently best camera and then compared against a new render to enforce a consistent geometry.
  • Figure 3: Our silhouette based alignment loss penalizes the unaligned pixels given a reference and the rendered gray scale masks.
  • Figure 4: Comparison with SAMURAI decomposition for joint pose and object reconstruction. Due to the improved alignment and representation higher frequency details are reconstructed in shape and the BRDF components compared to SAMURAI. Notice the improved texture detail and silhouettes of ours. Both methods optimize camera poses jointly initialized from rough quadrants.
  • Figure 5: Novel view synthesis compared to existing methods. Compared to other methods on an example view from the NAVI jampani2023navi in-the-wild test set, SHINOBI preserves fine detail and recreates the lighting realistically.
  • ...and 8 more figures