Table of Contents
Fetching ...

MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors

Yehonathan Litman, Or Patashnik, Kangle Deng, Aviral Agrawal, Rushikesh Zawar, Fernando De la Torre, Shubham Tulsiani

TL;DR

MaterialFusion introduces StableMaterial, a 2D diffusion prior finetuned to predict albedo $I_d$ and ORM $I_{orm}$ from RGB images, to enable disentangled 3D reconstruction of geometry, BRDF, and lighting from multi-view data. The method integrates a Score Distillation Sampling (SDS) based objective with a differentiable renderer, optimizing a mesh $G$, texture $(k_d,k_{orm})$, and environment map $L$ guided by the diffusion prior. A new BlenderVault dataset of high-quality PBR objects supports robust prior learning, and the approach yields significant relighting improvements over state-of-the-art baselines on synthetic and real objects. The work demonstrates improved material fidelity and consistent relighting under novel illumination, and provides a public dataset release to accelerate future research in relightable 3D reconstruction.

Abstract

Recent works in inverse rendering have shown promise in using multi-view images of an object to recover shape, albedo, and materials. However, the recovered components often fail to render accurately under new lighting conditions due to the intrinsic challenge of disentangling albedo and material properties from input images. To address this challenge, we introduce MaterialFusion, an enhanced conventional 3D inverse rendering pipeline that incorporates a 2D prior on texture and material properties. We present StableMaterial, a 2D diffusion model prior that refines multi-lit data to estimate the most likely albedo and material from given input appearances. This model is trained on albedo, material, and relit image data derived from a curated dataset of approximately ~12K artist-designed synthetic Blender objects called BlenderVault. we incorporate this diffusion prior with an inverse rendering framework where we use score distillation sampling (SDS) to guide the optimization of the albedo and materials, improving relighting performance in comparison with previous work. We validate MaterialFusion's relighting performance on 4 datasets of synthetic and real objects under diverse illumination conditions, showing our diffusion-aided approach significantly improves the appearance of reconstructed objects under novel lighting conditions. We intend to publicly release our BlenderVault dataset to support further research in this field.

MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors

TL;DR

MaterialFusion introduces StableMaterial, a 2D diffusion prior finetuned to predict albedo and ORM from RGB images, to enable disentangled 3D reconstruction of geometry, BRDF, and lighting from multi-view data. The method integrates a Score Distillation Sampling (SDS) based objective with a differentiable renderer, optimizing a mesh , texture , and environment map guided by the diffusion prior. A new BlenderVault dataset of high-quality PBR objects supports robust prior learning, and the approach yields significant relighting improvements over state-of-the-art baselines on synthetic and real objects. The work demonstrates improved material fidelity and consistent relighting under novel illumination, and provides a public dataset release to accelerate future research in relightable 3D reconstruction.

Abstract

Recent works in inverse rendering have shown promise in using multi-view images of an object to recover shape, albedo, and materials. However, the recovered components often fail to render accurately under new lighting conditions due to the intrinsic challenge of disentangling albedo and material properties from input images. To address this challenge, we introduce MaterialFusion, an enhanced conventional 3D inverse rendering pipeline that incorporates a 2D prior on texture and material properties. We present StableMaterial, a 2D diffusion model prior that refines multi-lit data to estimate the most likely albedo and material from given input appearances. This model is trained on albedo, material, and relit image data derived from a curated dataset of approximately ~12K artist-designed synthetic Blender objects called BlenderVault. we incorporate this diffusion prior with an inverse rendering framework where we use score distillation sampling (SDS) to guide the optimization of the albedo and materials, improving relighting performance in comparison with previous work. We validate MaterialFusion's relighting performance on 4 datasets of synthetic and real objects under diverse illumination conditions, showing our diffusion-aided approach significantly improves the appearance of reconstructed objects under novel lighting conditions. We intend to publicly release our BlenderVault dataset to support further research in this field.
Paper Structure (31 sections, 5 equations, 15 figures, 4 tables)

This paper contains 31 sections, 5 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Given an image set of an object under unknown illumination, MaterialFusion recovers the object's geometry, BRDF appearance, and the environmental illumination, via inverse rendering. Our method utilizes a 2D material diffusion prior to accurately reconstruct these properties. On the left, we display the input image set of the bills alongside the output of the reconstructed properties, visualized as the materials, albedo, and mesh from top to bottom, respectively. On the right, we show different objects rendered under novel lighting conditions with the reconstructed physical properties.
  • Figure 2: StableMaterial receives an RGB image as input and outputs the albedo $\hat{\mathbf{I}}_\text{d}$ and ORM $\hat{\mathbf{I}}_\text{orm}$ 2D maps. To train StableMaterial, we use BlenderVault objects to render a dataset of multi-view images under varying illuminations as well as the corresponding albedo and ORM maps. Given a triplet $(\mathbf{x}, \mathbf{I}_\text{d}, \mathbf{I}_\text{orm})$ of an image and its albedo and ORM maps, we encode them using the pretrained Stable Diffusion encoder and concatenate the image latent with the noisy albedo and ORM latents. The model is then trained with a diffusion loss to denoise the albedo and ORM maps.
  • Figure 3: MaterialFusion reconstructs an object's geometry, PBR materials, and environmental illumination from a set of multi-view images under a fixed lighting condition. In addition to the reconstruction and regularization losses computed between our rendered images $\hat{\mathbf{x}}$ and reference RGB images $\mathbf{x}$, MaterialFusion employs priors from our pre-trained StableMaterial to enhance PBR material reconstruction. Specifically, it calculates an SDS loss for the rendered albedo and ORM components, $\hat{\mathbf{I}}_\text{d}$ and $\hat{\mathbf{I}}_{\text{orm}}$ conditioned on $\mathbf{x}$.
  • Figure 4: Qualitative comparison for MaterialFusion vs. other methods. We present the 3D reconstructed albedo, ORM, environment light map, and relit rendered images for three different objects, both synthetic and real. Our method demonstrates better accuracy compared to the baseline methods, as can be seen by the accuracy of the reconstructed materials and the relit image appearance. Our prior also acts as an additional regularizer on other 3D properties such as geometry and illumination.
  • Figure 5: Qualitative comparison of the albedo and ORM 2D predictions. The Derender3D ORM data is marked as N/A since it does not offer ORM predictions. Given 4 images of an object, StableMaterial recovers complex material data. StableMaterialMV attends to appearance details across views, recovering consistent and high quality materials across challenging views, as seen in the cup example.
  • ...and 10 more figures